Lead Solutions Architect ? AI Infrastructure & Private Cloud
Bengaluru, Karnataka, India
1 month ago
Applicants: 0
1 month left to apply
Job Description
Job Title: Lead Solutions Architect ? AI Infrastructure & Private Cloud Location: Bengaluru (Electronic City) Experience: 10?15 Years (Lead / Architect Level) Position Type: Full-Time | Immediate Joiners Preferred Criticality: High Role Overview: We are seeking a Lead Solutions Architect specializing in AI Infrastructure and Private Cloud to design and deliver scalable, high-performance compute environments for machine learning, deep learning, and AI workloads. The ideal candidate will have deep expertise in Kubernetes , container orchestration , GPU/TPU acceleration , and HPC (High Performance Computing) architectures, enabling AI-driven innovation across enterprise platforms. Key Responsibilities: Architect, design, and implement AI/ML infrastructure solutions across private and hybrid cloud environments. Lead setup and optimization of Kubernetes Landing Zones , including cluster design, multi-tenancy, and security. Manage containerized workloads using orchestration tools (Kubernetes, Docker, Podman, OpenShift). Integrate AI accelerators (NVIDIA GPUs, TPUs) for ML/DL model training and inference. Enable deployment of deep learning models with a focus on hardware acceleration, scalability, and performance tuning. Build and maintain edge and cloud-native deployment pipelines for AI workloads. Collaborate with AI/ML and DevOps teams to ensure robust CI/CD workflows for model deployment. Drive HPC architecture design , including compute, storage, networking, and scheduling (SLURM, PBS, etc.). Optimize HPC and AI infrastructure for cost, performance, and resource utilization. Provide technical leadership in evaluating and integrating emerging technologies (AI frameworks, MLOps platforms, accelerator hardware). Define standards, documentation, and best practices for AI infrastructure operations. Required Technical Skills: Containerization & Orchestration: Kubernetes, Docker, Helm, OpenShift, Rancher Cloud Platforms: AWS, Azure, GCP (Private & Hybrid Cloud expertise preferred) AI/ML Infrastructure: NVIDIA GPU integration, CUDA, TensorRT, TPUs, PyTorch/TensorFlow deployment High Performance Computing (HPC): HPC architecture, schedulers (SLURM, PBS), parallel computing, storage & network optimization DevOps & CI/CD: GitHub Actions, Jenkins, ArgoCD, Terraform, Ansible Monitoring & Observability: Prometheus, Grafana, ELK Stack Scripting/Programming: Python, Bash, YAML, Go (preferred) Desired Skills: Experience with RAG/LLM model deployment pipelines or AI workload orchestration Knowledge of edge computing and distributed inference systems Exposure to AI model lifecycle management (MLOps) Strong problem-solving, leadership, and cross-functional collaboration skills
Required Skills
Additional Information
- Company Name
- Tekskills Inc.
- Industry
- N/A
- Department
- N/A
- Role Category
- Cloud Engineer
- Job Role
- Mid-Senior level
- Education
- No Restriction
- Job Types
- Remote
- Gender
- No Restriction
- Notice Period
- Less Than 30 Days
- Year of Experience
- 1 - Any Yrs
- Job Posted On
- 1 month ago
- Application Ends
- 1 month left to apply