Senior AWS DevOps Engineer (ECS / CI-CD)
Hyderabad, Telangana, India
3 weeks ago
Applicants: 0
2 days left to apply
Job Description
Your mission: build and maintain a secure, automated, and observable AWS foundation so engineers can ship faster, safer, and cheaper. You?ll be the owner of deployment velocity, system uptime, and cloud cost sanity across our ECS-based microservices. What You?ll Own 1. Platform Reliability Design and maintain ECS clusters (Fargate/EC2) for multi-service workloads. Implement autoscaling, health checks, and blue/green rollouts for zero-downtime deployments. Build observability into everything ? logs, metrics, traces ? to shorten MTTR. 2. Delivery Automation Architect and maintain CI/CD pipelines using GitHub Actions + CodePipeline/CodeBuild . Enforce testing, security scanning, and deployment gates as part of every release. Move from semi-manual deploys to fully automated pipelines across environments. 3. Network & Security Manage VPC architectures (subnets, routing, gateways, VPN, endpoints). Handle Route 53 for internal/external DNS, SSL/TLS, health checks, and routing policies. Maintain multi-account setup with IAM least privilege, KMS encryption, and security baselines. 4. Infrastructure as Code Define all infra in Terraform/CDK; no console drift. Use IaC reviews and environments for repeatable, compliant infrastructure. 5. Data Layer Operations Operate and optimize ClickHouse and PostgreSQL clusters ? backups, replication, partitioning, and tuning. Ensure RTO/RPO objectives are met and documented. 6. Monitoring & Debugging Aggregate logs (CloudWatch, FireLens, OpenTelemetry). Build dashboards and alerts that highlight anomalies, not noise. Lead root-cause investigations across network, container, and app layers. Core Tech Stack AWS: ECS (Fargate/EC2), EC2, S3, VPC, Route 53, CloudWatch, CodePipeline, CodeBuild CI/CD: GitHub Actions, Docker, Terraform/CDK Databases: ClickHouse, PostgreSQL Languages (plus): FastAPI (Python), Node.js Networking: DNS, VPN, load balancers, private link, peering, NAT, IGW Security: Multi-account strategy, IAM roles/policies, KMS, AWS Config, GuardDuty Requirements 5+ years running production workloads on AWS. Deep knowledge of ECS, CodePipeline, EC2/VPC, S3 , and Docker . Proven track record of shipping secure automated deployments . Strong understanding of networking and DNS fundamentals. Experience managing databases in production. Strong debugging and observability mindset. Clear written communication and operational discipline. Nice to Have Familiarity with FastAPI or Node.js applications to optimize deployment flows. Hands-on with cost-optimization and cross-account automation (Organizations, Control Tower). Experience setting up VPNs , Bastion, or SSO integration. What Success Looks Like ? All ECS services deployed via automated pipelines. ? CloudWatch dashboards and alerts in place for core systems. ? Verified ClickHouse and PostgreSQL backups/restores. ? Documented multi-account/VPC network topology. ? No manual deploys, no console changes. Why This Role Matters This role defines the foundation for everything we build. The more you automate, the faster teams deliver. You?ll directly impact uptime, developer productivity, and cloud spend ? three metrics that define operational excellence.
Required Skills
Additional Information
- Company Name
- graph8
- Industry
- N/A
- Department
- N/A
- Role Category
- SRE (Site Reliability Engineer)
- Job Role
- Mid-Senior level
- Education
- No Restriction
- Job Types
- Remote
- Gender
- No Restriction
- Notice Period
- Less Than 30 Days
- Year of Experience
- 1 - Any Yrs
- Job Posted On
- 3 weeks ago
- Application Ends
- 2 days left to apply