Bestkaam Logo
GreyOrange Logo

Senior Site Reliability Engineer

Delhi NCR, Haryana, India

2 months ago

Applicants: 0

Salary Not Disclosed

3 weeks left to apply

Job Description

We are seeking a talented and motivated Senior Site Reliability Engineer (SRE) to join our organization. The SRE team at GreyOrange is responsible for monitoring the stability and availability of mission-critical production systems, managing incidents for quicker resolution, and establishing BAU. The team also manages and maintains internal tools/infra which is consumed by other development teams. The experienced SRE will play a crucial role in ensuring the reliability, scalability, capacity planning, and performance of our infrastructure and applications. The ideal candidate will have a strong background in software engineering, system administration, containerization, and cloud technologies. Requirements Should have 5 to 8 years of experience Well-versed with scripting/programming languages (Python/Bash/PowerShell, etc.) to automate manual work, particularly within cloud environments Well-versed with Observability tools (Grafana, Splunk, Dynatrace) for monitoring, alerting, and logging solutions to identify and address potential issues, especially in cloud infrastructure Working experience with automation tools (Jenkins, GitLab, Ansible/Chef for configuration management) and processes to streamline deployment, monitoring, and management of systems and applications in the cloud Hands-on experience with containerization and orchestration technologies such as Docker, Kubernetes, or similar, particularly in cloud-native environments Well aware of SLI, SLO, SLA, and Error Budget concepts and their implementations; provide on-call support and participate in incident management & response activities as needed Expert with troubleshooting production issues and bugs. Good knowledge of Unix systems, networking, web technologies, and databases. Incident Management experience coupled with effective communication skills for production workload. Working knowledge in any one of the cloud platforms (AWS or GCP) What you'll do: Lead reliability engineering projects and drive them to closure. Ensure system stability and high availability by proactively monitoring performance and troubleshooting issues Design, build and maintain efficient, reliable, and scalable cloud-based infrastructure and services Automate processes and find opportunities to improve the observability and availability of the Platform to reduce toil. Implement and manage observability tools for comprehensive monitoring, alerting, and logging Own end-to-end availability and performance of different services & tools. Practice sustainable incident response and blameless postmortems. Provide on-call support for incident management and participate actively in response activities

Additional Information

Company Name
GreyOrange
Industry
N/A
Department
N/A
Role Category
N/A
Job Role
Mid-Senior level
Education
No Restriction
Job Types
On-site
Gender
No Restriction
Notice Period
Less Than 30 Days
Year of Experience
1 - Any Yrs
Job Posted On
2 months ago
Application Ends
3 weeks left to apply

Similar Jobs

Sanofi

3 weeks ago

DevOps Engineering Lead

Sanofi

Turing

3 weeks ago

Full Stack Developer - 17853

Turing

Zorba AI

3 weeks ago

Sr. Python Developer _ 7+Years

Zorba AI

Technogen India Pvt. Ltd.

3 weeks ago

Python Developer

Technogen India Pvt. Ltd.

ProductSquads

3 weeks ago

Full Stack Engineer

ProductSquads

Lauren

3 weeks ago

Senior/Lead RPA Developer

Lauren

EY

3 weeks ago

Testing-ETL-Senior

EY

Nagarro

2 months ago

Senior Staff Engineer, Cloud - Infrastructure

Nagarro

EY

3 weeks ago

DE-Cloud-Platform Engineer-N02

EY

Azure, AWS, Bash +2
Turing

3 weeks ago

Software Engineer (Full Stack) - 17853

Turing