Bestkaam Logo
Infinite Computer Solutions Logo

SRE Devops Manager - Hybrid Mode

Bengaluru, Karnataka, India

3 weeks ago

Applicants: 0

Salary Not Disclosed

2 days left to apply

Job Description

We are looking for Site Reliability Engineering (SRE) Devops Manager Location: Bangalore / Hyderabad / Chennai / Noida / Pune / Visakhapatnam / Gurgaon Shift timing: regular Can join Immediate - 30 days Interested candidates, Please share your profiles and below details to Email ID: [email protected] Total experience: Relevant Experience: Current CTC: Expected CTC: Notice Period: If Serving Notice Period, Last working day: Email ID: [email protected] Job Summary We are seeking an experienced Site Reliability Engineering (SRE) Manager to lead and evolve our cloud infrastructure, reliability practices, and automation strategy. This role blends hands-on technical leadership with strategic oversight to ensure scalable, secure, and reliable systems across AWS-based environments. As an SRE Manager, you will guide a team of DevOps and SRE engineers to design, build, and operate cloud-native platforms leveraging Kubernetes (EKS) , Terraform , and AWS DevOps tools . You will drive operational excellence through observability, automation, and AIOps?enhancing reliability, performance, and cost efficiency. You will collaborate closely with development, product, and security teams to define SLOs, manage error budgets , and continuously improve infrastructure resilience and developer productivity. Key Responsibilities Leadership & Strategy Lead, mentor, and grow a global team of Site Reliability and DevOps Engineers. Define and drive the reliability roadmap, SLOs, and error budgets across services. Establish best practices for infrastructure automation, observability, and incident response. Partner with engineering leadership to shape long-term cloud, Kubernetes, and AIOps strategies. Infrastructure & Automation Design, implement, and manage AWS cloud infrastructure using Terraform (advanced modules, remote state management, custom providers). Build and optimize CI/CD pipelines using AWS CodePipeline, CodeBuild, CodeDeploy, and CodeCommit. Manage EKS clusters with focus on scalability, reliability, and cost efficiency?leveraging Helm, ingress controllers, and service mesh (e.g., Istio). Implement robust security and compliance practices (IAM policies, network segmentation, secrets management). Automate environment provisioning for dev, staging, and production using Infrastructure as Code (IaC). Monitoring, Observability & Reliability Lead observability initiatives using Prometheus, Grafana, CloudWatch, and OpenSearch/ELK . Improve system visibility and response times by enhancing monitoring, tracing, and alerting mechanisms. Drive proactive incident management and root cause analysis (RCA) to prevent recurring issues. Apply chaos engineering principles and reliability testing to ensure resilience under load. AIOps & Advanced Operations Integrate AIOps tools to proactively detect, diagnose, and remediate operational issues. Design and manage scalable deployment strategies for AI/LLM workloads (e.g., Llama, Claude, Cohere). Monitor model performance and reliability across hybrid Kubernetes and managed AI environments. Stay current with MLOps and Generative AI infrastructure trends, applying them to production workloads. Manage 24/7 operations using apropos alerting tools and follow-the-sun model Cost Optimization & Governance Analyze and optimize cloud costs through instance right-sizing, auto-scaling, and spot usage. Implement cost-aware architecture decisions and monitor monthly spend for alignment with budgets. Establish cloud governance frameworks to enhance cost visibility and accountability across teams. Collaboration & Process Partner with developers to streamline deployment workflows and improve developer experience. Maintain high-quality documentation, runbooks, and postmortem reviews. Foster a culture of reliability, automation, and continuous improvement across teams.

Additional Information

Company Name
Infinite Computer Solutions
Industry
N/A
Department
N/A
Role Category
DevOps Engineer
Job Role
Mid-Senior level
Education
No Restriction
Job Types
Remote
Gender
No Restriction
Notice Period
Less Than 30 Days
Year of Experience
1 - Any Yrs
Job Posted On
3 weeks ago
Application Ends
2 days left to apply

Similar Jobs

Uplers

3 weeks ago

Cloud Data Engineers

Uplers

EY

1 month ago

SAP BI/BW HANA-Senior

EY

UPS

6 days ago

Senior MLOps / AIOps Engineer - MLflow, GCP, Vertex AI, IBM Watsonx, Terraform

UPS

Talent Worx

1 month ago

Azure HCI Engineer

Talent Worx

F5

1 month ago

Software Development Engineer III

F5

Tata Consultancy Services

3 weeks ago

Excellent Opportunity in TCS! Gen AI Developer - Kolkata

Tata Consultancy Services

PwC Acceleration Centers

1 month ago

Senior Associate ? DevOps (Salesforce Practice)

PwC Acceleration Centers

Siemens

1 month ago

IT Team lead Low Code and DevOps Solutions

Siemens

EPAM Systems

1 month ago

Systems Engineer - AWS Migration

EPAM Systems

Accenture services Pvt Ltd

3 weeks ago

S&C Global Network - AI - Song - CDP-Consultant

Accenture services Pvt Ltd