About The Position

We are seeking a forward‑thinking and execution‑oriented IT Infrastructure & Operations, Associate Director (Monitoring & Observability) to provide thought leadership and hands‑on delivery of modern observability practices across our IT infrastructure and applications. This role will lead the strategy, architecture, and implementation of monitoring, logging, tracing, and AI‑driven observability capabilities that enable proactive detection, rapid diagnosis, and continuous optimization of system reliability and performance. The ideal candidate brings a strong point of view on observability maturity, leverages automation and AIOps, and has a proven track record of turning vision into scalable, operational solutions. Hybrid: Eight days a month we come together in the closest office (USA) within 50 miles to experience the value of connecting with colleagues.

Requirements

  • 5yr+ Tech/Leadership minimum in role
  • Proven experience leading technical teams and serving as a trusted voice on observability, monitoring strategy, and operational excellence
  • 2+ years of experience as a people manager, with direct accountability for team performance
  • Demonstrated success leading a team of ~6 direct reports
  • Experience applying AIOps, machine learning, or advanced analytics to improve signal quality, detection, and root‑cause analysis.
  • Demonstrated success driving complex initiatives from concept to production with measurable outcomes.
  • Strong background working with SRE, incident management, and reliability frameworks.
  • Hands‑on and architectural experience of monitoring and ITSM platforms.
  • Experience in alert tuning, capacity monitoring, performance analysis, and systematic problem elimination.
  • Ability to partner across engineering, operations, and leadership teams to drive adoption and change.
  • Proven ability to manage large‑scale, multi‑stakeholder initiatives with clear delivery milestones.
  • Must be legally authorized to work in the United States without employer sponsorship, now or in the future.

Responsibilities

  • Lead, mentor, and inspire a team of monitoring and observability engineers, setting a clear technical vision and fostering a culture of ownership, innovation, and delivery excellence.
  • Act as a thought leader for observability, defining and driving an enterprise‑wide strategy that incorporates modern telemetry (metrics, logs, traces), AI/AIOps, automation, and reliability engineering principles.
  • Own end‑to‑end implementation and delivery of observability initiatives, from design and proof‑of‑concept through production rollout, adoption, and continuous optimization.
  • Oversee the implementation, scaling, and modernization of observability platforms and tools (e.g., Datadog, Splunk, Prometheus, Grafana, New Relic, ELK, OpenTelemetry), ensuring reliability, performance, and cost effectiveness.
  • Champion AI‑driven observability and analytics, including anomaly detection, predictive alerting, noise reduction, and intelligent root‑cause analysis.
  • Define and enforce instrumentation standards, monitoring coverage requirements, dashboarding conventions, and SLO / SLA frameworks aligned with business outcomes.
  • Partner closely with infrastructure, platform, DevOps, application, security, and incident response teams to embed observability into system design, CI/CD pipelines, and operational workflows.
  • Lead post‑incident reviews and continuous improvement efforts, translating insights into concrete enhancements that reduce recurrence, alert fatigue, and mean time to detect (MTTD) and resolve (MTTR).
  • Deliver data‑driven insights and executive‑level reporting on system health, reliability, capacity trends, and observability maturity.
  • Evaluate, select, and manage vendor relationships and tool roadmaps, influencing platform direction and driving maximum value from investments.
  • Stay ahead of emerging trends in observability, AIOps, and reliability engineering, and translate those ideas into strategy and practical solutions for the organization.

Benefits

  • Medical, Dental, & Vision Plans
  • 401(k)
  • FSA/HSA
  • Commuter Benefits
  • Tuition Assistance Plan
  • Vacation and Sick Time
  • Paid Parental Leave

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service