Site Reliability Engineer ? Datadog
Hyderabad, Telangana, India
2 months ago
Applicants: 0
Share
2 weeks left to apply
Job Description
Site Reliability Engineering (SRE), Datadog, Kubernetes, Terraform, Infrastructure as Code (ARM templates, Terraform, Bicep), Automation, Monitoring Tools, Automation, Incident Management Description GSPANN is hiring a Site Reliability Engineer with expertise in Datadog to design, automate, and maintain scalable infrastructure for production environments. The role focuses on improving reliability, monitoring, and performance using Kubernetes, Terraform, and Datadog. Location: Hyderabad Role Type: Full Time Published On: 28 October 2025 Experience: 7+ Years Share this job Description GSPANN is hiring a Site Reliability Engineer with expertise in Datadog to design, automate, and maintain scalable infrastructure for production environments. The role focuses on improving reliability, monitoring, and performance using Kubernetes, Terraform, and Datadog. Role and Responsibilities Design, build, and maintain scalable, reliable infrastructure for production and development environments. Automate infrastructure provisioning, configuration, and deployments using Infrastructure as Code (IaC) tools. Monitor system performance and implement proactive strategies to improve uptime and availability. Respond to incidents, troubleshoot production issues, and perform Root Cause Analysis (RCA) to ensure quick recovery. Define and implement Service-Level Indicators (SLIs), Service-Level Objectives (SLOs), and Service-Level Agreements (SLAs). Collaborate with software engineering teams to enhance application reliability, scalability, and performance. Conduct post-incident reviews and implement tools or processes to prevent issue recurrence. Create monitoring dashboards and analytics reports using Datadog and similar observability platforms. Demonstrate proficiency in Kubernetes, Git, and Terraform for automation and deployment workflows. Maintain clear, detailed documentation for system architecture, configurations, and operational procedures. Skills And Experience 7+ years of experience in Site Reliability Engineering (SRE), Infrastructure Engineering, or related domains. Strong expertise in Kubernetes, Terraform, and Git-based workflows. Hands-on experience with observability and monitoring tools such as Datadog. Proven experience implementing automation and Infrastructure as Code (IaC) best practices. In-depth understanding of SLIs, SLOs, and SLAs. Excellent analytical, troubleshooting, and problem-solving skills. Ability to thrive in a fast-paced, collaborative engineering environment.
Required Skills
Additional Information
- Company Name
- GSPANN Technologies, Inc
- Industry
- N/A
- Department
- N/A
- Role Category
- N/A
- Job Role
- Mid-Senior level
- Education
- No Restriction
- Job Types
- On-site
- Employment Types
- Full-Time
- Gender
- No Restriction
- Notice Period
- Less Than 30 Days
- Year of Experience
- 1 - Any Yrs
- Job Posted On
- 2 months ago
- Application Ends
- 2 weeks left to apply
Similar Jobs
Quick Apply
Upload your resume to apply for this position