Technical Program Manager (Platform and SRE)

Bengaluru, Karnataka, India

2 months ago

Applicants: 0

Program Management Cloud Computing Cloud RBAC Engineering

Salary Not Disclosed

6 days left to apply

Job Description

Role: Technical Program Manager (Platform and SRE) Function: Program Management / Site Reliability Location: Bengaluru Industry: AI infrastructure, Cloud Computing About Company This role is with a rapidly growing AI infrastructure startup founded in 2025 in Bengaluru by a leadership team with deep product, cloud, and systems experience from global-scale tech companies. The company has built a GenAI-powered private cloud platform that automates and manages complex AI workloads across hybrid, on-prem, edge, and sovereign cloud environments ? designed for enterprise sectors where performance, data security, and compliance are critical. Backed by leading global VCs and prominent operators (approx. $10M seed raised), the company is recognized for strong engineering rigor and product clarity. Its platform focuses on AI-native orchestration, deep observability, and cost/performance optimization to help large enterprises deploy and scale AI with confidence. This is an opportunity to join early and shape the future of AI-first cloud infrastructure. Position Overview You orchestrate mission-critical platform and SRE programs that power an AI-first cloud used by security-sensitive enterprises. You partner with engineering and leadership to deliver secure, observable, and cost-optimized infrastructure that enables customers to run AI workloads with confidence. Your work sets incident management standards, identity controls, and FinOps insights that influence the platform?s rapid growth. Role & Responsibilities Run end-to-end programs for platform capabilities, including logging, metrics, traces, dashboards, alert policies, and cost views. Drive security and identity initiatives covering key management, RBAC, SSO, baseline policies, and audit trails. Coordinate delivery of platform Infrastructure-as-Code modules, shared environments, and drift detection in partnership with SRE and infrastructure teams. Standardize incident management for data and AI platforms: define SLOs, create runbooks, manage rollout strategy, and lead post-incident reviews. Track FinOps metrics for GPU and general compute, and present usage and optimization insights to leadership. Must have Criteria 4?5 years in technical program management or platform/SRE/DevOps roles running multi-team deliveries. Solid understanding of cloud infrastructure, containers and Kubernetes, observability stacks, and CI/CD. Hands-on exposure to Terraform or other Infrastructure-as-Code tools and platform security concepts (identity, secrets, policy as code). Bachelor?s degree in Computer Science, Engineering, or equivalent experience. Nice to Have Experience with API gateways or service meshes. Prior work in high-growth or distributed teams. Familiarity with ML/AI platform SRE.