Engineer Site Reliability
Actively Reviewing the ApplicationsEmpower
Job Description
Our vision for the future is based on the idea that transforming financial lives starts by giving our people the freedom to transform their own. We have a flexible work environment, and fluid career paths. We not only encourage but celebrate internal mobility. We also recognize the importance of purpose, well-being, and work-life balance. Within Empower and our communities, we work hard to create a welcoming and inclusive environment, and our associates dedicate thousands of hours to volunteering for causes that matter most to them.
Chart your own path and grow your career while helping more customers achieve financial freedom. Empower Yourself.
As a Site Reliability Engineer at Empower, you'll be a key contributor to ensuring the reliability, scalability, and performance of our financial services platform. Working within value streams, you'll operate production systems serving millions of customers while maintaining the high availability standards required in fintech.
ESSENTIAL FUNCTIONS:
Own operational excellence for assigned systems and services within your value stream
Participate in on-call rotations, responding to incidents and driving them to resolution
Lead postmortem processes for incidents, identifying root causes and implementing preventative measures
Build and maintain infrastructure as code using Terraform across multiple AWS environments
Manage and optimize EKS clusters, implementing best practices for container orchestration
Design and implement monitoring, alerting, and observability solutions using Datadog and Splunk
Develop automation tools and scripts to reduce toil and improve operational efficiency
Collaborate with development teams on deployment strategies, implementing progressive delivery patterns
Maintain and improve CI/CD pipelines in GitLab CI and Jenkins
Contribute to capacity planning and performance optimization initiatives
Mentor Entry-level SREs, providing guidance on operational best practices
Document runbooks, architecture decisions, and system behaviors
QUALIFICATIONS:
Required:
Bachelor's degree in Computer Science, Information Technology, or related field (or equivalent practical experience)
2-4 years of experience in Site Reliability Engineering, DevOps, or Systems Engineering
Production experience with Kubernetes, including deployment, troubleshooting, and optimization
Proficiency in Infrastructure as Code, particularly Terraform
Solid programming skills in Python, Go, or similar languages
Experience with observability platforms (Datadog, Splunk, or similar)
Understanding of CI/CD principles and experience with GitLab CI, Jenkins, or equivalent
Knowledge of networking fundamentals and troubleshooting
Familiarity with GitOps workflows and practices
Experience participating in on-call rotations and incident management
Understanding of high-availability architecture patterns
Preferred:
Experience in financial services or highly regulated industries
Familiarity with compliance frameworks (SOC 2, PCI DSS)
Experience with service mesh technologies (Istio, Linkerd)
AWS certifications (Solutions Architect Associate or higher)
CKA (Certified Kubernetes Administrator) certification
Experience with disaster recovery and business continuity planning
Background in site reliability engineering practices and SLO/SLI methodologies
Technical Environment
AWS | EKS | Kubernetes | Terraform | Datadog | Splunk | GitOps | GitLab CI | Jenkins | Python | Go | Helm | Prometheus
What Success Looks Like
Systems maintain 99.9%+ availability with minimal unplanned downtime
Automation initiatives reduce operational toil by measurable margins
Positive collaboration with development teams on reliability improvements
On-call load is manageable through proactive reliability work
This job description is not intended to be an exhaustive list of all duties, responsibilities and qualifications of the job. The employer has the right to revise this job description at any time. You will be evaluated in part based on your performance of the responsibilities and/or tasks listed in this job description. You may be required perform other duties that are not included on this job description. The job description is not a contract for employment, and either you or the employer may terminate employment at any time, for any reason.
We are an equal opportunity employer with a commitment to diversity. All individuals, regardless of personal characteristics, are encouraged to apply. All qualified applicants will receive consideration for employment without regard to age, race, color, national origin, ancestry, sex, sexual orientation, gender, gender identity, gender expression, marital status, pregnancy, religion, physical or mental disability, military or veteran status, genetic information, or any other status protected by applicable state or local law.
Workplace Flexibility: RemoteRequired Skills
Quick Tip
Customize your resume and cover letter to highlight relevant skills for this position to increase your chances of getting hired.
Related Similar Jobs
View All
Associate, ML Data Operations, GO-AI Operations
Amazon
SAP MM Consultant
ARITA Solutions W.L.L
Dot Net FullStack Developer
Capgemini
Business Analyst
Scoutit
MERN Stack Developer
Blended Pedagogy Private Limited
Share
Quick Apply
Upload your resume to apply for this position