Google Cloud Platform

Actively Reviewing the Applications

People Prime Worldwide

Bengaluru, Karnataka, India Full-Time

Posted 5 months ago • Apply by May 4, 2026

Job Description

About the Company Our client is a trusted global innovator of IT and business services, present in 50+ countries. They specialize in digital & IT modernization, consulting, managed services, and industry-specific solutions. With a commitment to long-term success, they empower clients and society to move confidently into the digital future. Job Description Important Note (Please Read Before Applying) ?? Do NOT apply if: You have less than 5 years in GCP. You do not have hands-on PyTorch , TensorFlow . You are on a notice period longer than 15 days. ? Apply ONLY if you meet ALL criteria above. Random / irrelevant applications will not be processed. Job Title: Google Cloud Platform Location: Remote (Global) | Preferred: US / EU Time Zones Job Type: Full-Time Experience Required: 8+ Years About the Role: We?re looking for a Senior ML Inference Engineer with deep expertise in containerized ML workflows , large model inference , and cloud-native deployment . You?ll work at the intersection of MLOps , deep learning , and cloud infrastructure , helping productionize some of the most powerful language models available today. ? Key Responsibilities: Deploy and optimize large-scale models (e.g., Mixtral, Gemma) for inference performance and latency . Build and maintain highly optimized Docker containers , using multi-stage builds and best practices for performance and security. Work with high-performance inference servers such as vLLM for efficient GPU utilization in production. Manage and automate deployment on Google Cloud Platform (GCP) using tools like GKE , Cloud Run , and Artifact Registry . Support model deployment pipelines for Google Cloud?s Model Garden , handling complex dependency resolution. Write and maintain clear, reproducible documentation for container builds, deployment processes, and system management. ?? Requirements: 8+ years of experience in software/ML engineering roles. Advanced knowledge of PyTorch and TensorFlow . Strong hands-on experience with Docker and container-based deployment workflows. Proven experience with ML inference optimization , especially in GPU-accelerated environments. Familiarity with LLMs and open-weight models (e.g., Mixtral, Gemma, LLaMA, Falcon). Solid grasp of Google Cloud services for container orchestration. Excellent communication skills and a passion for clean, scalable infrastructure. ??? Nice to Have: Experience with other inference frameworks (e.g., TensorRT, ONNX Runtime, DeepSpeed). Familiarity with CI/CD pipelines and infrastructure as code .