Bestkaam Logo
Morgan Stanley Logo

Databricks AI Platform SRE_Director_Infrastructure Production Management & Reliability Engineering

India, Karnataka, Bengaluru

1 week ago

Applicants: 0

Salary Not Disclosed

2 weeks left to apply

Job Description

Profile Description We?re seeking someone to join our Enterprise Technology team as a Databricks AI Platform SRE, in Enterprise Computing (EC) to join our Platform SRE team. This role will be critical in designing, building, and optimizing a scalable, secure, and developer-friendly Databricks platform to enable Machine Learning (ML) and Artificial Intelligence (AI) workloads at enterprise scale. You will partner with ML engineer, data scientists, platform teams, and cloud architects to automate infrastructure, enforce best practices, and streamline the end-to-end ML lifecycle using modern cloud-native technologies. In the Technology division, we leverage innovation to build the connections and capabilities that power our Firm, enabling our clients and colleagues to redefine markets and shape the future of our communities. This is Director position that maintains the stability and reliability of the organization's infrastructure systems, ensuring optimal performance and availability to support business operations. Since 1935, Morgan Stanley is known as a global leader in financial services, always evolving and innovating to better serve our clients and our communities in more than 40 countries around the world. What You?ll Do In The Role Design and implement secure, scalable, and automated Databricks environments to support AI/ML workloads. Develop infrastructure-as-code (IaC) solutions using Terraform for provisioning Databricks, cloud resources, and network configurations. Build automation and self-service capabilities using Python, Java and APIs for platform onboarding, workspace provisioning, orchestration and monitoring. Collaborate with data science and ML teams to define compute requirements, governance policies, and efficient workflows across dev/qa/prod environments. Integrate Databricks offering with cloud-native services on Azure/AWS- Champion CI/CD and GitOps for managing ML infrastructure and configurations.- Ensure compliance with enterprise security and data governance policies using RBAC, Audit Controls, Encryption, Network Isolation, and policies. Monitor platform performance, reliability, and usage, and drive improvements to optimize cost and resource utilizations. What You?ll Bring To The Role At least 4+ years' relevant experience would generally be expected to find the skills required for this role . Proven experience with Terraform for building and managing infrastructure. Strong programming skills in Python and Java. Hands-on experience with cloud networking, identity and access management, key vaults, monitoring, and logging in Azure. Hands on experience with Databricks (Workspace management, Clusters, Jobs, MLFlow, Delta Lake, Unity Catalog, Mosaic AI). Deep understanding of Azure or AWS infrastructure (e.g. IAM, VNets/VPC, Storage, Networks, Compute, Key management, monitoring)- Strong experience in distributed system design, development and deployment using agile/devops practices. Experience with CI/CD pipelines (GitHub Actions, or similar) Experience implementing monitoring and observability using Prometheus, Grafana or Databricks-native solutions. Good communication skills, excellent teamwork experience, ability to mentor and develop more junior developers, including participating in constructive code reviews. Experience in multi-cloud environments (AWS/GCP) is a bonus. Experience in working in highly regulated environments (finance, healthcare, etc.) is desirable- Experience with Databricks REST APIs and SDKs- Knowledge of MLFlow, Mosaic AC, & MLOps tooling- Working with teams using Scrum, Kanban or other agile practices Proficiency with standard Linux command line and debugging tools Azure or AWS Certifications What You Can Expect From Morgan Stanley We are committed to maintaining the first-class service and high standard of excellence that have defined Morgan Stanley for over 89 years. Our values - putting clients first, doing the right thing, leading with exceptional ideas, committing to diversity and inclusion, and giving back - aren?t just beliefs, they guide the decisions we make every day to do what's best for our clients, communities and more than 80,000 employees in 1,200 offices across 42 countries. At Morgan Stanley, you?ll find an opportunity to work alongside the best and the brightest, in an environment where you are supported and empowered. Our teams are relentless collaborators and creative thinkers, fueled by their diverse backgrounds and experiences. We are proud to support our employees and their families at every point along their work-life journey, offering some of the most attractive and comprehensive employee benefits and perks in the industry. There?s also ample opportunity to move about the business for those who show passion and grit in their work. To learn more about our offices across the globe, please copy and paste https://www.morganstanley.com/about-us/global-offices into your browser. Morgan Stanley is an equal opportunities employer. We work to provide a supportive and inclusive environment where all individuals can maximize their full potential. Our skilled and creative workforce is comprised of individuals drawn from a broad cross section of the global communities in which we operate and who reflect a variety of backgrounds, talents, perspectives, and experiences. Our strong commitment to a culture of inclusion is evident through our constant focus on recruiting, developing, and advancing individuals based on their skills and talents.

Additional Information

Company Name
Morgan Stanley
Industry
N/A
Department
N/A
Role Category
SRE (Site Reliability Engineer)
Job Role
Mid-Senior level
Education
No Restriction
Job Types
On-site
Gender
No Restriction
Notice Period
Immediate Joiner
Year of Experience
1 - Any Yrs
Job Posted On
1 week ago
Application Ends
2 weeks left to apply

Similar Jobs

People Prime Worldwide

2 months ago

Master Data Management Developer

People Prime Worldwide

HireFlex

2 weeks ago

Senior Data Engineer

HireFlex

American Express

2 weeks ago

Software Engineer II

American Express

HERE Technologies

2 months ago

Sr Data Scientist

HERE Technologies

PwC India

2 months ago

IN-Senior Associate_Java/Python Developer_Risk Analytics_Advisory_ PAN India

PwC India

UPS

1 week ago

Senior Application Developer - Java Full stack

UPS

Metyis

3 weeks ago

Senior Fullstack Engineer

Metyis

Turing

2 months ago

Senior Software Engineer - 35501

Turing

Tyfone, Inc.

1 week ago

Software Engineer

Tyfone, Inc.

Git, MySQL, Maven +1
Turing

1 week ago

Remote Full Stack Engineer (Python + JS/TS)

Turing