Senior Site Reliability Engineer (SRE)
Actively Reviewing the ApplicationsSalla
Saudi Arabia
Full-Time
On-site
Posted 3 weeks ago
•
Apply by April 21, 2026
Job Description
As a Senior SRE at Salla, you will lead reliability initiatives, handle complex incidents, improve platform performance, and guide engineering teams toward building resilient systems. You will also participate in the on-call rotation as part of our commitment to platform reliability.
Reliability & Incident Management
Reliability & Incident Management
- Lead high-severity incident response and drive post-incident reviews
- Troubleshoot complex issues across applications, infrastructure, and networks
- Improve MTTR through better monitoring, alerts, and diagnostic tooling
- Participate in the on-call rotation supporting production systems
- Identify and resolve performance bottlenecks and scaling challenges
- Conduct load testing and capacity planning for high-traffic scenarios
- Enhance cloud-native infrastructure, deployment processes, and automation
- Improve resilience, fault-tolerance, and recovery mechanisms across systems
- Build and refine dashboards, alerts, metrics, logs, and traces
- Define SLIs/SLOs and improve visibility into system behavior
- Develop tools that reduce operational toil and increase reliability
- Contribute to infrastructure-as-code, CI/CD pipelines, and GitOps workflows
- Work closely with engineering teams to ensure services are robust and production-ready
- Mentor engineers on reliability, debugging, and operational best practices
- Background in large-scale, high-traffic systems
- Experience with fault-tolerant design, DR, and HA patterns
- Familiarity with SLOs, SLIs, and error budgets
- Candidates located within GMT 0 to +6 time zones are preferred to align with team collaboration and on-call coverage
- Strong experience with Kubernetes, service mesh technologies, and cloud platforms (AWS, GCP, or Azure).
- Deep understanding of Linux, networking, distributed systems, and load balancing.
- Hands-on experience with Terraform or similar Infrastructure-as-Code tools.
- Experience with observability platforms such as Prometheus, Grafana, Loki, Mimir, Elastic, or equivalent.
- Proficiency in scripting or programming languages such as Bash, Python, or Go.
- Experience with CI/CD pipelines and GitOps practices.
- Strong debugging, incident response, and performance analysis skills
Quick Tip
Customize your resume and cover letter to highlight relevant skills for this position to increase your chances of getting hired.
Related Job Recommendations
View All
UAT - Payment API
93% matchVirtusa
India
Full-Time
Dashboards
Manager - Digital Commerce
87% matchPepsiCo
India
Full-Time
₹4–12 LPA
Sales
MARKETING
Analytics
+1
Senior Technical Project Manager
90% matchUplers
India
Full-Time
Engineering
Git
Python
+7
SAP ABAP - HR Developer
78% matchInfosys
India
Full-Time
₹4–6 LPA
Database
Lead Java Software Developer – Java 8, Microservices, ReactJS, JUnit
93% matchEPAM Systems
India
Full-Time
₹12–16 LPA
Git
JavaScript
MySQL
+9
Share
Quick Apply
Upload your resume to apply for this position