Site Reliability Engineer

Actively Reviewing the Applications

PwC India

Bengaluru Full-Time 4–8 years

Posted 2 days ago • Apply by June 11, 2026

Job Description

Opportunity

We are looking for SREs who want to define what reliability means for the next generation of industrial software. Defining SLIs/SLOs, building observability platforms, and establishing incident management processes.

Responsibilities

Define and implement SLI/SLO frameworks for complex engineering systems across manufacturing and industrial clients
Design and deploy observability platforms using Prometheus, Grafana, and Datadog
Establish incident management processes and lead blameless post-mortems
Implement chaos engineering practices to proactively identify system weaknesses
Drive toil elimination through automation and platform improvements
Build reliability engineering capabilities within the practice and client organisations

Essential Skills

SLI/SLO definition and implementation at enterprise scale
Observability: Prometheus, Grafana, Datadog, New Relic
Incident management and post-mortem facilitation
Chaos engineering: Gremlin, Chaos Monkey, Litmus
Python testing for reliability validation and automated runbooks
Automation and scripting: Python, Go, Bash
Cloud platforms: AWS, Azure, GCP

Experience

5–10 years in SRE or Production Engineering roles with experience in enterprise or industrial environments

Required Skills

UART SPI DDR

Related Similar Jobs

View All

Lead Site Reliability Engineer

Advaiya Solutions, Inc

Hyderabad Full-Time 4–8 years

UART SPI DDR