Bestkaam Logo
Bridgenext Logo

Lead NOC/Site Reliability Engineer

Pune, Maharashtra, India

6 days ago

Applicants: 0

Salary Not Disclosed

3 weeks left to apply

Job Description

Job ID: Lea-ETP-Pun-1192 Location: Pune Lead NOC/Site Reliability Engineer Responsibilities 6+ years of experience in SRE, DevOps, or infrastructure management. Lead the NOC/SRE team from the front, ensuring a culture of proactive monitoring, rapid response, and continuous improvement. Act as the primary escalation point for major incidents, providing technical guidance and decision-making. Collaborate with DevOps, Engineering, and Product teams to enhance system reliability. Define best practices, incident response protocols, and runbooks for the team. Lead log tracing and deep troubleshooting for infrastructure, network, and application issues. Reduce MTTR (Mean Time to Resolution) and improve incident management processes. Expertise in troubleshooting complex infrastructure and application issues. Strong knowledge of log tracing, distributed tracing, and observability tools (e.g., ELK, Splunk, Grafana, Prometheus, OpenTelemetry). Deep understanding of SLAs, SLOs, and error budgets. Experience with cloud platforms (AWS, Azure, GCP) and container orchestration (Kubernetes, Docker). Good knowledge of Terraform, Kubernetes, Docker, and cloud architectures. Proficiency in monitoring and observability tools (New Relic, Prometheus, Datadog, etc.). Understanding of CI/CD pipelines, automation, and infrastructure as code (IaC). Basic scripting skills in Python, Go, Shell, or similar. Strong troubleshooting skills for complex distributed systems. Ability to mentor junior engineers and drive SRE best practices. Willingness to primarily work during 3:30 PM to 3:30 AM IST, with flexibility to adjust shifts as needed based on operational requirements Strong problem-solving skills and ability to work in a fast-paced environment. Strong incident management, troubleshooting, and RCA skills. Qualifications 6+ years of experience in Site Reliability Engineering (SRE) / NOC / DevOps roles. Proven leadership experience, managing or mentoring a team. Hands-on experience with Terraform for Infrastructure as Code (IaC). Experience in Python for automation and scripting. Expertise in troubleshooting complex infrastructure and application issues. Strong knowledge of log tracing, distributed tracing, and observability tools (e.g., ELK, Splunk, Grafana, Prometheus, OpenTelemetry). Deep understanding of SLAs, SLOs, and error budgets. Experience with cloud platforms (AWS, Azure, GCP) and container orchestration (Kubernetes, Docker). Familiarity with CI/CD pipelines and GitOps practices. Strong problem-solving skills and the ability to make quick, data-driven decisions under pressure.

Additional Information

Company Name
Bridgenext
Industry
N/A
Department
N/A
Role Category
SRE (Site Reliability Engineer)
Job Role
Mid-Senior level
Education
No Restriction
Job Types
On-site
Employment Types
Full-Time
Gender
No Restriction
Notice Period
Less Than 30 Days
Year of Experience
1 - Any Yrs
Job Posted On
6 days ago
Application Ends
3 weeks left to apply

Similar Jobs

Infosys

2 months ago

Python ML developers

Infosys

BlackRock

4 weeks ago

ETL Developer - USWA Client Data Quality Engineer, Associate

BlackRock

Uplers

2 months ago

Senior Machine Learning Engineer

Uplers

FactSet

6 days ago

Senior Software Engineer

FactSet

Aditi India

1 day ago

NVIDIA Omniverse Developer

Aditi India

Accenture services Pvt Ltd

4 weeks ago

Custom Software Engineer

Accenture services Pvt Ltd

Turing

2 months ago

Software Engineer (Full Stack) - 17853

Turing

Uplers

3 weeks ago

Senior Full Stack Engineer (ReactJS, React native & Python)

Uplers

Uplers

4 weeks ago

Lead Data Engineer

Uplers

Uplers

2 months ago

Senior Full Stack Engineer (ReactJS, React native & Python)

Uplers