Lead Site Reliability Engineer
Actively Reviewing the ApplicationsAdvaiya Solutions, Inc
Job Description
Advaiya Solutions is a leading C&I (consulting and implementation) services company, providing business applications and analytics solutions to organizations across the world. We help businesses gain advantage by enabling digital workplace—through identifying, architecting, building, implementing, integrating, and ensuring adoption of relevant technology solutions and innovations. We are seeking enthusiastic and smart business analysts to join us growing team.
Company Website: https://advaiya.com
Job Profile: Lead Site Reliability Engineer
Experience Required: 10 to 15 Years
Job Location: Ahmedabad - On Site
Role Overview:
The Site Reliability Engineering (SRE) Lead will oversee reliability, performance, and operational excellence across a complex, multi-environment cloud infrastructure. This role combines leadership and deep technical expertise to manage a team of SREs providing L3 support for mission-critical applications and platforms.
The ideal candidate will champion reliability engineering practices—automation, observability, scalability, and resilience—while ensuring proactive cost management, compliance, and performance optimization across Azure environments.
Key Responsibilities Includes: -
Leadership and Management:
- Lead and mentor a team of site reliability engineers providing 24x7 support.
- Define and implement SRE strategy, aligning with business and technology goals.
- Drive continuous improvement in incident response, deployment reliability, and automation.
- Coordinate with application, DevOps, security, and infrastructure teams to ensure end-to-end operational integrity.
- Conduct capacity planning, post-mortems, and root cause analysis to enhance service resilience.
- reliability operations and automation
- Establish and maintain automation frameworks for patching, monitoring, scaling, and deployment.
- Ensure optimal performance and uptime of VMs, app services, and cloud-native components.
- Develop and enforce incident management protocols, including escalation, resolution, and documentation.
- Oversee CI/CD pipelines in Jenkins and Azure DevOps, ensuring consistent and secure deployment processes.
- Implement infrastructure-as-code principles for reproducible, scalable environments.
- monitoring, alerting, and observability
- Architect and manage end-to-end observability using Azure Monitor, Application Insights, Log Analytics, and Grafana.
- Develop automated dashboards and alerts to detect anomalies and prevent service degradation.
- Lead performance reviews and generate periodic reports on system health, utilization, and SLA adherence.
Cloud and Infrastructure Management:
- Manage multi-subscription Azure environments across development, staging, performance, and production.
- Oversee VM patching, scaling, certificate renewal, DNS, and backup operations.
- Troubleshoot and resolve complex networking issues involving latency, routing, or connectivity.
- Collaborate with security teams to manage SOC/SIEM alerts, vulnerability remediation, and PAM operations.
Finops and cost Optimization:
- Implement FinOps practices to track, analyze, and optimize cloud spend.
- Produce monthly cost reports with insights by service, application, and environment.
- Recommend cost-control measures such as rightsizing, automation of idle resource cleanup, and reserved instance management.
Change and Release Management:
- Lead Change Advisory Board (CAB) processes and ensure controlled, risk-mitigated production deployments.
- Support weekly release cycles with coordination across application and infrastructure teams.
- Maintain rollback strategies and ensure pre-deployment validations are automated and repeatable.
Quick Tip
Customize your resume and cover letter to highlight relevant skills for this position to increase your chances of getting hired.
Related Similar Jobs
View All
Site Reliability Engineer
PwC India
Sr. Insurance and Benefits Specialist (Americas)
Multiplier
Travel Nurse RN - PCU - Progressive Care Unit - $1,787 per week
AMN Healthcare Nursing
Lead Data engineer ETL + Cloud
CG-VAK Software & Exports Ltd.
Field Service Technician
Integrated Power Services
Share
Quick Apply
Upload your resume to apply for this position