Site Reliability Engineer - Datadog/AWS Lambda/DynamoDB

Hyderabad, Telangana, India

3 weeks ago

Applicants: 0

Apply Now

AWS Lambda APM Log management SQS Python

Salary Not Disclosed

2 days left to apply

Job Description

Description Job Title : Site Reliability Engineer (SRE) - DataDog / AWS Lambda / DynamoDB / Serverless Location : Bangalore / Pune / Hyderabad Experience : 5- 10 Years About The Role We are seeking an experienced Site Reliability Engineer (SRE) with strong expertise in DataDog integration, AWS Lambda, DynamoDB, and Serverless architectures. The ideal candidate will be responsible for building, monitoring, and maintaining highly reliable, scalable, and secure cloud-based systems. Key Responsibilities Design, implement, and maintain monitoring and observability solutions using DataDog (metrics, logs, traces, dashboards, and alerts). Develop and optimize serverless applications using AWS Lambda and related AWS services. Manage and optimize DynamoDB for scalability, reliability, and cost efficiency. Automate deployment and infrastructure provisioning using AWS CDK / CloudFormation / Terraform. Implement reliability engineering practices including performance tuning, auto-scaling, and fault tolerance. Collaborate with development teams to design and implement highly available, resilient, and secure architectures. Troubleshoot production issues and drive root cause analysis (RCA) to ensure long-term stability. Continuously improve CI/CD pipelines and observability frameworks. Required Skills & Experience 5-10 years of total experience, with at least 3+ years in SRE / DevOps roles. Hands-on experience with DataDog setup and integrations (custom metrics, APM, log management). Strong experience with AWS Lambda, DynamoDB, and other Serverless services (API Gateway, Step Functions, SQS, SNS). Proficiency in Python / Node.js / Bash scripting for automation. Experience with IaC tools like Terraform, CloudFormation, or AWS CDK. Solid understanding of AWS architecture, networking, and security best practices. Working knowledge of CI/CD tools (GitHub Actions, Jenkins, CodePipeline, etc.). Experience with incident management, monitoring dashboards, and alerting automation. Good To Have Experience with Kubernetes / ECS / EKS for container orchestration. Familiarity with CloudWatch, Prometheus, or Grafana. AWS Certification (Solutions Architect / DevOps Engineer) preferred. (ref:hirist.tech)

Required Skills

AWS Lambda APM Log management SQS Python

Additional Information

Company Name: BYLD Group
Industry: N/A
Department: N/A
Role Category: SRE (Site Reliability Engineer)
Job Role: Mid-Senior level
Education: No Restriction
Job Types: On-site
Gender: No Restriction
Notice Period: Less Than 30 Days
Year of Experience: 1 - Any Yrs
Job Posted On: 3 weeks ago
Application Ends: 2 days left to apply

Python, JavaScript, TypeScript +2