Senior Cloud Engineer

Actively Reviewing the Applications

RoshAi

India, Kerala, Kochi Full-Time

Posted 3 days ago • Apply by June 30, 2026

Job Description

Role Overview

We are looking for a Senior Cloud Engineer who can design, build, and scale complex cloud and hybrid infrastructure from scratch (0 → production → scale).

This role is not limited to cloud provisioning. You will own:

• Multi-cloud architecture (Azure, AWS, GCP)

• End-to-end DevSecOps and MLOps platforms

• Inference and training infrastructure

• Robotics/edge deployment pipelines (OTA)

• Scalable SaaS platform architecture

• Hybrid observability and monitoring systems

Key Responsibilities

1. Cloud Architecture & Platform Engineering

• Architect and implement multi-cloud environments (AWS, Azure, GCP) across IaaS, PaaS, and SaaS workloads

• Design landing zones, networking (VPC/VNet), IAM, storage, and security baselines

• Build highly available, fault-tolerant systems across regions and clouds

• Optimize for cost, performance, and scalability, aligned with Well-Architected Frameworks

• Manage cloud subscriptions/accounts and governance

• Implement Cloud Security Posture Management (CSPM)

2. DevSecOps & Platform Automation

• Design and implement secure CI/CD pipelines (GitOps preferred)

• Integrate:

SAST, DAST, container scanning, IaC scanning, VAPT
Secrets management (Vault, KMS, etc.)

• Enforce policy-as-code (OPA, Azure Policy, AWS SCPs)

• Automate infrastructure provisioning using Terraform (mandatory)

• Implement vulnerability scanning and patch management processes

3. MLOps & AI Infrastructure

• Build and manage:

Model training pipelines
Inference clusters (real-time and batch)

• Deploy models using:

Kubernetes (AKS/EKS/GKE)
Serverless or GPU-based inference systems

• Implement:

Model versioning
Experiment tracking
CI/CD for ML workflows

• Optimize GPU utilization and cost efficiency

• Experience with data annotation tools and workflows

• Hands-on exposure to:

Azure AI / AI Foundry
AWS SageMaker
GCP Vertex AI

4. Robotics DevOps & OTA Systems

• Design pipelines for robotics and edge device deployments

• Implement OTA (Over-the-Air) update systems

• Handle:

Intermittent connectivity
Edge-to-cloud synchronization

• Work with:

ROS/ROS2 environments (preferred)
Containerized edge workloads

5. SaaS Platform Architecture

• Architect and deploy multi-tenant SaaS platforms

• Implement:

Tenant isolation
Scalable backend services
API gateways and service meshes

• Ensure high availability and zero-downtime deployments

6. Hybrid Infrastructure (Cloud + On-Prem)

• Design and manage:

On-prem compute clusters (ML training servers)
Storage systems (NAS, object storage, distributed file systems)

• Integrate hybrid networking:

VPN / Direct Connect / ExpressRoute

• Enable workload portability across environments

• Implement:

Endpoint management and security (e.g., Intune)
Backup and disaster recovery solutions

7. Data Engineering & Pipelines

• Build scalable data pipelines for:

Streaming and batch workloads

• Work with:

Kafka / PubSub / EventHub
Data lakes and warehouses

• Ensure:

Data reliability
Observability
Governance

8. Observability, CloudOps & FinOps

• Build centralized monitoring systems across multi-cloud and on-prem environments

• Implement:

Metrics (Prometheus, cloud-native tools)
Logging (ELK / OpenSearch)
Tracing (OpenTelemetry / Jaeger)

• Define and manage:

SLIs, SLOs, and alerting strategies

• Apply CloudOps and FinOps principles for operational efficiency and cost control

Required Skills & Experience

Experience & Ownership

• Total of 12 ~ 15 years of IT experience

• 8 ~ 10+ years in Cloud / Infra Platform Engineering / DevOps

• Proven experience building and operating production-scale systems (0 → scale)

• Strong ownership mindset: architecture + implementation + operations

Multi-Cloud & Core Infrastructure

• Hands-on experience with AWS, Azure, and GCP (minimum 2 at strong proficiency)

• Deep understanding of:

Cloud architecture (IaaS, PaaS, SaaS)
Storage Services (AWS S3, Azure Blob storage, GCP Storage Bucket)
Networking (VPC/VNet, routing, private connectivity)
IAM, security, and governance

• Experience with:

High availability, multi-region design, and disaster recovery
Cloud cost optimization (FinOps awareness)

Infrastructure as Code & Automation

• Advanced expertise in Terraform:

Modular design, remote state management, workspaces
CI/CD integration and environment promotion

• Strong scripting skills (Python / Bash)

Kubernetes & Distributed Systems

• Production experience with Kubernetes (AKS/EKS/GKE):

Cluster architecture, scaling, and operations
Networking, ingress, and service discovery
Multi-cluster or hybrid deployments

• Strong understanding of distributed systems fundamentals

DevSecOps

• Experience building secure CI/CD pipelines:

GitHub Actions / GitLab CI / Azure DevOps

• Integration of:

SAST, DAST, container scanning, IaC scanning, VAPT

• Experience with:

Secrets management (Vault, KMS)
Policy-as-code (OPA, Azure Policy, AWS SCPs)
Vulnerability management and patching

MLOps & AI Infrastructure

• Hands-on experience with:

ML training pipelines and inference systems

• Model deployment using:

Kubernetes / GPU clusters / serverless inference

• Experience with:

Model lifecycle (versioning, CI/CD, monitoring)
GPU optimization and cost efficiency

• Exposure to:

Azure AI / AWS SageMaker / GCP Vertex AI

Hybrid Infrastructure (Cloud + On-Prem)

• Experience with:

On-prem compute and storage systems
Hybrid networking (VPN / Direct Connect / ExpressRoute)

• Backup, disaster recovery, and resilience strategies

Data Engineering & Pipelines

• Experience building:

Streaming and batch data pipelines

• Familiarity with:

Kafka / PubSub / EventHub
Data lakes and warehouses

• Understanding of data reliability and governance

Observability & Reliability Engineering

• Hands-on with:

Prometheus, Grafana
ELK / OpenSearch
OpenTelemetry / Jaeger

• Ability to define and implement:

SLIs / SLOs / alerting

• Experience with centralized monitoring across hybrid environments

Edge / Robotics / OTA (Preferred)

• Experience with:

OTA systems and edge deployments

• Familiarity with:

ROS/ROS2 ecosystems
Containerized edge workloads

Core Engineering Fundamentals (Non-Negotiable)

• Strong Linux fundamentals

• Solid networking knowledge (L4–L7)

• Ability to debug across layers (infra → network → application)

Nice to Have

• Experience building Internal Developer Platforms (IDP)

• Multi-cluster Kubernetes management across clouds

• Experience with robotics simulation platforms (e.g., CARLA)

• Exposure to endpoint management and security tools (e.g., Intune)

Required Skills

Machine Learning Monitoring AWS Microsoft Azure Google Cloud Platform Kubernetes GitLab Prometheus Grafana IAM Cloud Architecture Cloud Cost Optimization Amazon S3 Azure DevOps GitHub Actions Apache Kafka ROS CI/CD Cloud native VPN Disaster recovery VPC DAST SAST Adobe Illustrator Data pipelines OpenSearch Container security Amazon EKS Model Training Observability OpenTelemetry Vault Azure Blob storage Serverless GCP Vertex AI

Related Similar Jobs

View All

Application Developer-Java & Web Technologies

IBM

India Full-Time ₹18–20 LPA

Git Hibernate Design patterns +6

Senior Software Developer

IBM

India Full-Time

Engineering Networking JavaScript +36

DevOps Engineer

QNL Software

India Full-Time

Microsoft Azure Jenkins Kubernetes +5

Project Manager-Project Management Office

Mattel, Inc.

India Full-Time ₹32–39 LPA

Communication Product Development Salesforce +35

Software Engineer, Python & PySpark, VP

NatWest Group

India Full-Time

Machine Learning Python TensorFlow +5

Check Qualification

Quick Tip

Customize your resume and cover letter to highlight relevant skills for this position to increase your chances of getting hired.

Related Similar Jobs

View All

Application Developer-Java & Web Technologies

IBM

India Full-Time ₹18–20 LPA

Git Hibernate Design patterns +6

Senior Software Developer

IBM

India Full-Time

Engineering Networking JavaScript +36

DevOps Engineer

QNL Software

India Full-Time

Microsoft Azure Jenkins Kubernetes +5

Project Manager-Project Management Office

Mattel, Inc.

India Full-Time ₹32–39 LPA

Communication Product Development Salesforce +35

Software Engineer, Python & PySpark, VP

NatWest Group

India Full-Time

Machine Learning Python TensorFlow +5