Google Cloud Platform
Actively Reviewing the ApplicationsPeople Prime Worldwide
Bengaluru, Karnataka, India
Full-Time
Posted 5 months ago
•
Apply by May 4, 2026
Job Description
About the Company
Our client is a trusted global innovator of IT and business services, present in 50+ countries. They specialize in digital & IT modernization, consulting, managed services, and industry-specific solutions. With a commitment to long-term success, they empower clients and society to move confidently into the digital future.
Job Description Important Note (Please Read Before Applying)
?? Do NOT apply if:
You have less than 5 years in GCP.
You do not have hands-on
PyTorch
,
TensorFlow
.
You are on a notice period longer than 15 days.
? Apply ONLY if you meet ALL criteria above. Random / irrelevant applications will not be processed.
Job Title:
Google Cloud Platform
Location:
Remote (Global) | Preferred: US / EU Time Zones
Job Type:
Full-Time
Experience Required:
8+ Years
About the Role:
We?re looking for a
Senior ML Inference Engineer
with deep expertise in
containerized ML workflows
,
large model inference
, and
cloud-native deployment
. You?ll work at the intersection of
MLOps
,
deep learning
, and
cloud infrastructure
, helping productionize some of the most powerful language models available today.
? Key Responsibilities:
Deploy and optimize
large-scale models
(e.g., Mixtral, Gemma) for
inference performance
and
latency
.
Build and maintain
highly optimized Docker containers
, using multi-stage builds and best practices for performance and security.
Work with
high-performance inference servers
such as
vLLM
for efficient GPU utilization in production.
Manage and automate deployment on
Google Cloud Platform (GCP)
using tools like
GKE
,
Cloud Run
, and
Artifact Registry
.
Support model deployment pipelines for Google Cloud?s
Model Garden
, handling complex dependency resolution.
Write and maintain
clear, reproducible documentation
for container builds, deployment processes, and system management.
?? Requirements:
8+ years
of experience in software/ML engineering roles.
Advanced knowledge of
PyTorch
and
TensorFlow
.
Strong hands-on experience with
Docker
and container-based deployment workflows.
Proven experience with
ML inference optimization
, especially in GPU-accelerated environments.
Familiarity with
LLMs and open-weight models
(e.g., Mixtral, Gemma, LLaMA, Falcon).
Solid grasp of
Google Cloud services
for container orchestration.
Excellent communication skills and a passion for clean, scalable infrastructure.
??? Nice to Have:
Experience with other inference frameworks (e.g., TensorRT, ONNX Runtime, DeepSpeed).
Familiarity with
CI/CD pipelines
and
infrastructure as code
.
Quick Tip
Customize your resume and cover letter to highlight relevant skills for this position to increase your chances of getting hired.
Related Similar Jobs
View All
SiteRecon - Senior Backend Developer - Java/Python
SiteRecon
India
Full-Time
Event-driven architecture
Adobe Illustrator
Node.js Backend Developer
Webworks Co.
India
Full-Time
Event-driven architecture
Indexing
Clean architecture
+5
Lead Java Backend Developer (B2B Multi-Tenant SaaS)
Fastcurve
India
Full-Time
Event-driven architecture
Adobe Illustrator
Backend Developer
LILA
India
Full-Time
Prometheus
Grafana
Engineer III - Java Backend Developer
Cencora
Mumbai
Full-Time
4–8 years
Event-driven architecture
Adobe Illustrator
Rasa
+2
Share
Quick Apply
Upload your resume to apply for this position