Bestkaam Logo
Morgan Stanley Logo

Databricks AI Platform SRE_Director_Infrastructure Production Management & Reliability Engineering

India, Karnataka, Bengaluru

1 week ago

Applicants: 0

Salary Not Disclosed

2 weeks left to apply

Job Description

Profile Description We?re seeking someone to join our Enterprise Technology team as a Databricks AI Platform SRE, in Enterprise Computing (EC) to join our Platform SRE team. This role will be critical in designing, building, and optimizing a scalable, secure, and developer-friendly Databricks platform to enable Machine Learning (ML) and Artificial Intelligence (AI) workloads at enterprise scale. You will partner with ML engineer, data scientists, platform teams, and cloud architects to automate infrastructure, enforce best practices, and streamline the end-to-end ML lifecycle using modern cloud-native technologies. In the Technology division, we leverage innovation to build the connections and capabilities that power our Firm, enabling our clients and colleagues to redefine markets and shape the future of our communities. This is Director position that maintains the stability and reliability of the organization's infrastructure systems, ensuring optimal performance and availability to support business operations. Since 1935, Morgan Stanley is known as a global leader in financial services, always evolving and innovating to better serve our clients and our communities in more than 40 countries around the world. What You?ll Do In The Role Design and implement secure, scalable, and automated Databricks environments to support AI/ML workloads. Develop infrastructure-as-code (IaC) solutions using Terraform for provisioning Databricks, cloud resources, and network configurations. Build automation and self-service capabilities using Python, Java and APIs for platform onboarding, workspace provisioning, orchestration and monitoring. Collaborate with data science and ML teams to define compute requirements, governance policies, and efficient workflows across dev/qa/prod environments. Integrate Databricks offering with cloud-native services on Azure/AWS- Champion CI/CD and GitOps for managing ML infrastructure and configurations.- Ensure compliance with enterprise security and data governance policies using RBAC, Audit Controls, Encryption, Network Isolation, and policies. Monitor platform performance, reliability, and usage, and drive improvements to optimize cost and resource utilizations. What You?ll Bring To The Role At least 4+ years' relevant experience would generally be expected to find the skills required for this role . Proven experience with Terraform for building and managing infrastructure. Strong programming skills in Python and Java. Hands-on experience with cloud networking, identity and access management, key vaults, monitoring, and logging in Azure. Hands on experience with Databricks (Workspace management, Clusters, Jobs, MLFlow, Delta Lake, Unity Catalog, Mosaic AI). Deep understanding of Azure or AWS infrastructure (e.g. IAM, VNets/VPC, Storage, Networks, Compute, Key management, monitoring)- Strong experience in distributed system design, development and deployment using agile/devops practices. Experience with CI/CD pipelines (GitHub Actions, or similar) Experience implementing monitoring and observability using Prometheus, Grafana or Databricks-native solutions. Good communication skills, excellent teamwork experience, ability to mentor and develop more junior developers, including participating in constructive code reviews. Experience in multi-cloud environments (AWS/GCP) is a bonus. Experience in working in highly regulated environments (finance, healthcare, etc.) is desirable- Experience with Databricks REST APIs and SDKs- Knowledge of MLFlow, Mosaic AC, & MLOps tooling- Working with teams using Scrum, Kanban or other agile practices Proficiency with standard Linux command line and debugging tools Azure or AWS Certifications What You Can Expect From Morgan Stanley We are committed to maintaining the first-class service and high standard of excellence that have defined Morgan Stanley for over 89 years. Our values - putting clients first, doing the right thing, leading with exceptional ideas, committing to diversity and inclusion, and giving back - aren?t just beliefs, they guide the decisions we make every day to do what's best for our clients, communities and more than 80,000 employees in 1,200 offices across 42 countries. At Morgan Stanley, you?ll find an opportunity to work alongside the best and the brightest, in an environment where you are supported and empowered. Our teams are relentless collaborators and creative thinkers, fueled by their diverse backgrounds and experiences. We are proud to support our employees and their families at every point along their work-life journey, offering some of the most attractive and comprehensive employee benefits and perks in the industry. There?s also ample opportunity to move about the business for those who show passion and grit in their work. To learn more about our offices across the globe, please copy and paste https://www.morganstanley.com/about-us/global-offices into your browser. Morgan Stanley is an equal opportunities employer. We work to provide a supportive and inclusive environment where all individuals can maximize their full potential. Our skilled and creative workforce is comprised of individuals drawn from a broad cross section of the global communities in which we operate and who reflect a variety of backgrounds, talents, perspectives, and experiences. Our strong commitment to a culture of inclusion is evident through our constant focus on recruiting, developing, and advancing individuals based on their skills and talents.

Additional Information

Company Name
Morgan Stanley
Industry
N/A
Department
N/A
Role Category
SRE (Site Reliability Engineer)
Job Role
Mid-Senior level
Education
No Restriction
Job Types
On-site
Gender
No Restriction
Notice Period
Immediate Joiner
Year of Experience
1 - Any Yrs
Job Posted On
1 week ago
Application Ends
2 weeks left to apply

Similar Jobs

IBM

2 weeks ago

Application Developer-Cloud FullStack

IBM

Synechron

2 months ago

Java Developer

Synechron

TekWissen India

1 week ago

Software Engineer (II) - Commodity Engineer I

TekWissen India

People Prime Worldwide

1 week ago

Senior Data Engineer (Snowflake/DBT)

People Prime Worldwide

Turing

2 months ago

AI Data Engineer - 17852

Turing

Uplers

1 week ago

Fullstack Engineer

Uplers

Infosys

1 week ago

Python Full Stack Developer

Infosys

SailPoint

1 week ago

System Quality Assurance

SailPoint

Cygnus Professionals Inc.

2 months ago

Senior Salesforce QA Engineer

Cygnus Professionals Inc.

Send

1 week ago

Java Software Engineer

Send