MLOps Engineer

Actively Reviewing the Applications

Yotta Data Services Private Limited

Mumbai, Maharashtra, India Full-Time

Posted 4 months ago • Apply by May 4, 2026

Job Description

About the Role: We?re looking for a strategic Senior MLOps Engineer to lead the end-to-end design, implementation, and scaling of our AI infrastructure. You?ll partner with researchers, product teams, and DevOps to turn prototypes into production services that meet strict SLAs for latency, reliability, and cost efficiency. Responsibilities: ? Core MLOps Pipelines : Design and implement scalable ML pipelines (training, evaluation, deployment) for LLMs, CV, and multimodal models . ? Model Serving & CI/CD : Lead efforts in model serving, versioning, automated CI/CD, and real-time monitoring of AI workflows . ? Inference-as-a-Service : Build and optimize GPU-backed serving infrastructure targeting p99 latency < 100 ms, 99.9% uptime, and > 80% GPU utilization . ? Governance & Drift Detection : Drive initiatives on model governance, automated drift detection (?10% false positives), and data-management best practices . ? Vector Search & Agent Orchestration : Integrate vector databases (Qdrant, Pinecone) for low-latency semantic retrieval, and build agentic workflows using LangChain or similar frameworks. ? Enterprise Multi-Tenancy : Architect RBAC-driven, isolated ML services to securely serve 100?500+ organizations. Observability & Logging : Design Prometheus/Grafana dashboards, ELK/Fluentd logging pipelines, and alerting for all ML workloads. ? CI/CD for Inference APIs : Maintain CI/CD pipelines for Python (FastAPI) and TypeScript (NestJS) inference services. ? Metrics & Cost Optimization : Define and track SLAs/SLOs, optimize cloud spend by ? 20% year-over-year, and ensure GPU clusters operate at > 80% utilization. ? Cross-Functional Leadership : Partner with AI researchers, product managers, and legal to align MLOps standards with compliance and roadmap goals. ? Mentorship & Community : Mentor junior engineers, run quarterly brown-bags, own onboarding docs (upskill 5+ engineers/quarter), and publish ? 1 open-source contribution or talk annually. Requirements : ? 9?15 years in software engineering, including ? 4 years in MLOps or ML infrastructure ? Strong expertise in cloud platforms (AWS/GCP/Azure), Kubernetes, Docker, Terraform, Helm, Kubeflow, and MLflow ? Experience with inference frameworks (Triton, TensorFlow Serving, BentoML, TorchServe) Familiarity with distributed training, workload schedulers, and GPU-cluster orchestration ? Proficiency in Python, TypeScript, and infrastructure-as-code (Terraform, Helm, etc.) ? Proven track record building reliable, scalable ML systems in production. Plus These Critical Skills: ? Vector DB integration (Qdrant, Pinecone) ? Agent orchestration (LangChain, LlamaIndex) ? Multi-tenant security and RBAC ? Observability stacks (Prometheus/Grafana, ELK) ? CI/CD for FastAPI/NestJS services Minimum Qualification: Bachelor's or Master's Degree in Computer Science, Engineering, or a related field. Preferred : ?Prior experience at AI-focused startups or enterprises scaling ML for 100?500 orgs. ? Understanding of low-latency streaming inference or agent-based LLM systems. ? Excellent written and verbal communication, and a proven ability to drive consensus across functions.