Back to Jobs

Lead Engineer - DevOps

Actively Reviewing the Applications

Aspire

India, Haryana, Gurugram Full-Time

Posted 22 hours ago • Apply by June 7, 2026

Job Description

Lead Engineer - DevOps

About the team:

At Aspire, our core product relies on a robust, scalable, and highly available infrastructure. The DevOps team is the backbone of our engineering success, responsible for building, maintaining, and automating our CI/CD pipelines, cloud infrastructure, and operational tools. We drive a culture of automation, reliability, and security, ensuring our engineers can rapidly and safely deploy code, ultimately delivering a seamless experience to our customers.

Key Responsibilities

Infrastructure Automation & Management - Lead the design, implementation, and maintenance of our cloud infrastructure (primarily AWS) using Infrastructure as Code (IaC) tools like Terraform or CloudFormation.
CI/CD Pipeline Ownership - Own and enhance the entire continuous integration and continuous deployment process, ensuring fast, secure, and reliable releases across development, staging, and production environments (using GitHub Actions, Jenkins, or similar).
System Reliability & Scalability - Drive initiatives to improve system monitoring, alerting, and logging (e.g., Prometheus, Grafana, ELK stack). Implement and manage auto-scaling solutions to ensure high availability and performance under load.
Security Integration (DevSecOps) - Collaborate with the Security team to integrate security tools (SAST, SCA, vulnerability scanning) directly into the CI/CD pipeline and enforce security best practices across the infrastructure.
Operational Excellence - Define and track key performance indicators (KPIs) for infrastructure health and deployment velocity. Establish and maintain runbooks, disaster recovery plans, and incident response procedures.
Mentorship & Leadership - Mentor junior team members, set technical direction, and champion best practices in DevOps, SRE, and cloud native technologies across the engineering organization.

Minimum qualifications:

Education & Experience: Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience. 5+ years of progressive hands-on experience in DevOps, SRE, or Infrastructure Engineering, with at least 1 year in a technical leadership role.
Cloud Expertise (AWS) - Deep practical experience designing, deploying, and managing complex systems in AWS. Strong proficiency with core services like EC2, VPC, IAM, S3, RDS, ECS/EKS.
Infrastructure as Code (IaC) - Expert-level proficiency with Terraform, CloudFormation, or Ansible for managing infrastructure at scale.
CI/CD Proficiency - Extensive experience building, optimizing, and maintaining CI/CD pipelines (e.g., GitHub Actions, GitLab CI, Jenkins) for microservices and monolithic applications.
Containerization & Orchestration - Strong hands-on experience with Docker and Kubernetes (EKS/ECS preferred) for deployment and cluster management.
Scripting & Automation - Advanced proficiency in scripting languages (Python, Bash, or Go) for system automation, tool development, and API integration.
Monitoring & Observability - Experience implementing and managing comprehensive monitoring and logging solutions (e.g., Prometheus, Grafana, ELK stack, Datadog) to ensure high visibility into system performance.

Preferred qualifications :

Networking and Security - Advanced knowledge of networking fundamentals (TCP/IP, DNS, Load Balancing) and cloud security best practices, including WAF management (e.g., Cloudflare) and security group/NACL design.
Database Operations - Experience with database administration, scaling, and high-availability configuration for modern databases (e.g., PostgreSQL, MongoDB, Redis).
Advanced Kubernetes: Experience with service mesh (e.g., Istio), Helm, or advanced cluster autoscaling configurations.*
Good to Have - Compliance & GRC - Familiarity with compliance standards (e.g., ISO 27001, SOC2) and experience implementing automated controls for audit readiness.
SRE Principles - Deep understanding and application of Site Reliability Engineering (SRE) principles, including SLOs, error budgets, and incident management.