Site Reliability Engineer (SRE) – Core IT Infrastructure

Actively Reviewing the Applications

TECEZE

India, Tamil Nadu, Chennai Full-Time INR 10–16 LPA

Posted 2 months ago • Apply by May 23, 2026

Job Description

Role: Site Reliability Engineer (SRE) – Core IT Infrastructure

Location: Chennai

Work mode: On-site (full Time)

Experience: 6+ year's

Key Responsibilities

Infrastructure Reliability & Operations

• Design, implement, and maintain highly available and fault-tolerant infrastructure

• Ensure reliability, performance, scalability, and security of core IT systems

• Monitor system health, capacity, and performance using proactive observability practices

• Lead incident response, root cause analysis (RCA), and post-incident reviews

Automation & SRE Development

• Develop and maintain automation tools, scripts, and frameworks to reduce manual operations

• Apply Infrastructure as Code (IaC) principles using tools such as Terraform, Ansible, or CloudFormation

• Build self-healing systems and automate repetitive operational tasks

• Improve deployment pipelines and operational workflows through engineering solutions

DevOps & Platform Engineering

• Collaborate with DevOps, development, and security teams to support CI/CD pipelines

• Enable seamless application deployments with minimal downtime

• Support containerized and orchestration platforms (Docker, Kubernetes, OpenShift)

• Implement best practices for configuration management and environment consistency

Monitoring, Observability & Performance

• Design and maintain monitoring, logging, and alerting systems

• Define and track SLIs, SLOs, and SLAs

• Optimize system performance, capacity planning, and cost efficiency

• Enhance observability using tools such as Prometheus, Grafana, ELK, Datadog, or similar

Security & Compliance

• Implement infrastructure security best practices

• Collaborate with security teams on vulnerability management and compliance requirements

• Ensure secure access, identity management, and audit readiness

⸻

Required Skills & Qualifications

Technical Skills

• Strong experience in Linux/Unix system administration

• Proficiency in programming/scripting (Python, Go, Bash, Shell, or similar)

• Experience with cloud platforms (AWS, Azure, or GCP)

• Hands-on experience with containerization and orchestration

• Knowledge of networking concepts (DNS, TCP/IP, load balancing, firewalls)

• Experience with monitoring, logging, and alerting tools

Required Skills

IT Systems Prometheus Grafana Shell

Related Similar Jobs

View All

Site Reliability Engineer (SRE) – Core IT Infrastructure

TECEZE

India Full-Time ₹15–30 LPA

IT Systems Prometheus Grafana +1