Google Cloud Platform
Actively Reviewing the ApplicationsPeople Prime Worldwide
Bengaluru, Karnataka, India
Full-Time
On-site
Posted 3 months ago
•
Apply by May 4, 2026
Job Description
About the Company
Our client is a trusted global innovator of IT and business services, present in 50+ countries. They specialize in digital & IT modernization, consulting, managed services, and industry-specific solutions. With a commitment to long-term success, they empower clients and society to move confidently into the digital future.
Job Description Important Note (Please Read Before Applying)
?? Do NOT apply if:
You have less than 5 years in GCP.
You do not have hands-on
PyTorch
,
TensorFlow
.
You are on a notice period longer than 15 days.
? Apply ONLY if you meet ALL criteria above. Random / irrelevant applications will not be processed.
Job Title:
Google Cloud Platform
Location:
Remote (Global) | Preferred: US / EU Time Zones
Job Type:
Full-Time
Experience Required:
8+ Years
About the Role:
We?re looking for a
Senior ML Inference Engineer
with deep expertise in
containerized ML workflows
,
large model inference
, and
cloud-native deployment
. You?ll work at the intersection of
MLOps
,
deep learning
, and
cloud infrastructure
, helping productionize some of the most powerful language models available today.
? Key Responsibilities:
Deploy and optimize
large-scale models
(e.g., Mixtral, Gemma) for
inference performance
and
latency
.
Build and maintain
highly optimized Docker containers
, using multi-stage builds and best practices for performance and security.
Work with
high-performance inference servers
such as
vLLM
for efficient GPU utilization in production.
Manage and automate deployment on
Google Cloud Platform (GCP)
using tools like
GKE
,
Cloud Run
, and
Artifact Registry
.
Support model deployment pipelines for Google Cloud?s
Model Garden
, handling complex dependency resolution.
Write and maintain
clear, reproducible documentation
for container builds, deployment processes, and system management.
?? Requirements:
8+ years
of experience in software/ML engineering roles.
Advanced knowledge of
PyTorch
and
TensorFlow
.
Strong hands-on experience with
Docker
and container-based deployment workflows.
Proven experience with
ML inference optimization
, especially in GPU-accelerated environments.
Familiarity with
LLMs and open-weight models
(e.g., Mixtral, Gemma, LLaMA, Falcon).
Solid grasp of
Google Cloud services
for container orchestration.
Excellent communication skills and a passion for clean, scalable infrastructure.
??? Nice to Have:
Experience with other inference frameworks (e.g., TensorRT, ONNX Runtime, DeepSpeed).
Familiarity with
CI/CD pipelines
and
infrastructure as code
.
Required Skills
Quick Tip
Customize your resume and cover letter to highlight relevant skills for this position to increase your chances of getting hired.
Related Similar Jobs
View All
Senior Associate Security Technical Services Systems Integration Specialist
NTT DATA, Inc.
Mumbai
Full-Time
IPS
EDR
Proxy
+1
Software Engineer
The Depository Trust & Clearing Corporation (DTCC)
Chennai
Full-Time
Cost-effectiveness
Data integrity
Maintainability
+2
Senior Software Engineer_Full Stack_Java_React
Lowe's India
Bengaluru
Full-Time
Customer transactions
Business
Analytics
+2
Senior Software Engineer (Backend) - Gen AI
MyCareernet
Bengaluru
Full-Time
Spring Boot
RESTful APIs
RAG
+1
Senior Software Development Engineer - AI
Genesys
Hyderabad
Full-Time
Conversational
Software engineering
Data Science
+1
Share
Quick Apply
Upload your resume to apply for this position