Bestkaam Logo
Analytics Azure Docker Prometheus ITIL
Salary Not Disclosed

3 weeks left to apply

Job Description

Scope Cloud Application Monitoring Lead to drive proactive monitoring strategy, optimize alerting systems, and enhance operational visibility for our mission-critical applications. This role focuses on improving monitoring effectiveness, reducing noise, and minimizing customer-impacting incidents through automation, analytics, and best practices. Our Current Technical Environment Microsoft Azure VMware Esxi What You?ll Do Monitoring Strategy & Governance Define and implement enterprise-level monitoring frameworks for cloud and hybrid applications. Establish monitoring KPIs, SLAs, and continuous improvement processes. Drive alert noise reduction initiatives and enhance signal-to-noise ratio. Operational Excellence Analyze incident trends and monitoring gaps to recommend preventive measures. Implement root cause analysis (RCA) processes for monitoring-related issues. Ensure all critical applications have effective health, performance, and availability monitoring. Tooling & Automation Lead integration and optimization of monitoring tools (AppDynamics, Zabbix, Splunk, etc.). Oversee automation of repetitive monitoring tasks and alert triaging. Evaluate and recommend new monitoring technologies. Collaboration & Leadership Partner with application, infrastructure, DevOps, and SRE teams to improve observability. Mentor monitoring engineers and ensure adherence to best practices. Act as the escalation point for critical monitoring-related issues. What We Are Looking For Experience: 9-12 years in IT Operations, Database, Application Monitoring, or related roles, with at least 3 years in a lead or managerial position. Technical Expertise: Strong knowledge of monitoring tools such as AppDynamics, Zabbix, Splunk, and related integrations. Experience in cloud platforms (AWS, Azure, or GCP) and containerized environments (Kubernetes, Docker). Familiarity with logging, metrics, and tracing frameworks (e.g., ELK stack, Prometheus, Grafana). Experience in deploying and managing SAAS. Familiar with Supply Chain Management Products (Optional but will give an edge to candidates) Process & Improvement: Proven track record in reducing alert noise, improving monitoring coverage, and preventing customer-impacting incidents. Soft Skills: Strong analytical, problem-solving, and communication skills. Ability to influence and lead cross-functional teams. Preferred Qualifications: ITIL Foundation or similar certification. Experience with AIOps platforms and machine learning?driven monitoring. Background in performance engineering or capacity planning. Immediate Joiners Preferred. Our Values If you want to know the heart of a company, take a look at their values. Ours unite us. They are what drive our success ? and the success of our customers. Does your heart beat like ours? Find out here: Core Values All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status.

Required Skills

Analytics Azure Docker Prometheus ITIL

Additional Information

Company Name
Blue Yonder
Industry
N/A
Department
N/A
Role Category
Robotics Software Engineer
Job Role
Mid-Senior level
Education
No Restriction
Job Types
Remote
Gender
No Restriction
Notice Period
Less Than 30 Days
Year of Experience
1 - Any Yrs
Job Posted On
4 days ago
Application Ends
3 weeks left to apply