Mobius - DevOps Engineer - GPU
Hyderabad, Telangana, India
1 month ago
Applicants: 0
1 month left to apply
Job Description
Description About the Role : We are seeking an experienced DevOps Engineer to join our infrastructure team, with a strong focus on managing and optimizing GPU-based compute environments for machine learning and deep learning workloads. In this role, you will be responsible for the end-to-end infrastructure lifecyclefrom provisioning with Terraform/Ansible to deploying ML models using modern frameworks like Hugging Face and Ollama. Key Responsibilities Manage infrastructure using Terraform and Ansible Deploy and monitor Kubernetes clusters with GPU support (including NVIDIA drivers and H100 SXM integration) Implement and manage inferencing frameworks such as Ollama, Hugging Face, etc. Support containerization (Docker), logging (EFK), and monitoring (Prometheus/Grafana) Handle GPU resource scheduling, isolation, and scaling for ML/DL workloads Collaborate closely with developers, data scientists, and ML engineers to streamline deployments and performance Required Skill Set 5- 8 years of hands-on experience in DevOps and infrastructure automation Proven experience in managing GPU-based compute environments Strong understanding of Docker, Kubernetes, and Linux internals Familiarity with GPU server hardware and instance types Proficient in scripting with Python and Bash Good understanding of ML model deployment, inferencing workflows, and resource utilization/metering Nice To Have Experience with AI/ML pipelines Knowledge of cloud-native technologies (AWS/GCP/Azure) supporting GPU workloads Exposure to model performance benchmarking and A/B testing (ref:hirist.tech)
Additional Information
- Company Name
- Mobius by Gaian
- Industry
- N/A
- Department
- N/A
- Role Category
- SRE (Site Reliability Engineer)
- Job Role
- Mid-Senior level
- Education
- No Restriction
- Job Types
- On-site
- Gender
- No Restriction
- Notice Period
- Less Than 30 Days
- Year of Experience
- 1 - Any Yrs
- Job Posted On
- 1 month ago
- Application Ends
- 1 month left to apply