Site Reliability Engineer
Bengaluru, Karnataka, India
2 months ago
Applicants: 0
Share
3 weeks left to apply
Job Description
Job Description- Site Reliability Engineer Experience- 8+ Years Responsibilities : Ensure high availability, performance, and scalability of mission-critical systems and services. Lead the design and implementation of resilient and fault-tolerant infrastructure. Drive incident response, root cause analysis, and postmortem culture. Mentor others in incident practices. Write and maintain operational documentation, runbooks, and architecture diagrams. Drive and promote protocols on production readiness and operational excellence. Own and evolve infrastructure automation using Terraform or similar tools to remove as much as possible any human intervention. Help automate infrastructure provisioning and other engineering processes by working on automations built on top of an engineering platform written in GitHub Actions. Build internal platforms, tools, and frameworks to improve developer productivity and service reliability. Work closely with software engineers, platform teams, and product managers to align on company goals. Coach and up-skill other engineering team members Skills and Qualifications: 8?12+ years in SRE, DevOps, or related infrastructure-focused roles. Understand large-scale complex systems from a reliability perspective. Design, implement and maintain processes and tools. Passion for producing clean, standards-compliant, secure code. Bringing a developer mindset and applying it to infrastructure Strong experience with Linux/Unix systems. Deep experience with Kubernetes. Deep experience with tools like Terraform, Ansible, Helm. Strong coding skills in scripts for automating the execution of certain tasks with a programming language like Python, Bash or any other scripting language. Experience with at least one relational and non-relational databases (ex: PostgreSQL, MySQL, MongoDB, Redis, ElasticSearch). Ability to identify time consuming and error prone manual tasks and then build/leverage tooling to automate them. Ability to identify root causes of instability in a large-scale distributed system across stacks. Experience leading high-severity incident responses and postmortems Nice to haves / Pluses: Experience with cloud-based solutions such as Amazon AWS, Google Cloud, or Microsoft Azure. Experience supporting scalable DBs like PostgreSQL, or MongoDB in production. Understanding of cost
Required Skills
Additional Information
- Company Name
- HireAlpha
- Industry
- N/A
- Department
- N/A
- Role Category
- N/A
- Job Role
- Mid-Senior level
- Education
- No Restriction
- Job Types
- On-site
- Gender
- No Restriction
- Notice Period
- Less Than 30 Days
- Year of Experience
- 1 - Any Yrs
- Job Posted On
- 2 months ago
- Application Ends
- 3 weeks left to apply
Similar Jobs
Quick Apply
Upload your resume to apply for this position