Sr. AI Observability Engineer (Remote)

Tealium
81d$165,000 - $200,000

About The Position

Tealium is seeking a Senior AI Observability Engineer to lead the observability strategy for its AI/ML systems and AI-powered features. This role blends advanced observability engineering with a strong understanding of AI/ML lifecycles, ensuring visibility, reliability, performance, and responsible usage of both off-the-shelf and custom AI models across products and internal platforms. The engineer will join a team of 3 observability engineers, working cross-functionally with SRE, MLOps, data engineering, security, and product teams to deliver instrumentation for model quality, latency, drift, cost, user experience, and ethical safeguards.

Requirements

  • 6+ years in Site Reliability Engineering, Observability Engineering, or ML Ops with a focus on production-grade AI/ML systems.
  • Deep experience in instrumenting AI pipelines for observability.
  • Familiarity with prompt engineering, embeddings, vector DBs, and RAG-style architectures.
  • Hands-on experience with OpenTelemetry, Datadog, Sumologic, Prometheus, or similar.
  • Experience integrating observability into AI platforms.
  • Proficiency with Python, Go, or similar languages.
  • Familiarity with AWS services relevant to AI.
  • Experience deploying and observing third-party LLM APIs.
  • Strong background in Infrastructure-as-Code and CI/CD tooling.
  • Understanding of Kubernetes and container orchestration.
  • Experience with FinOps/cost optimization for AI workloads.
  • Strong understanding of ethical AI practices and responsible telemetry instrumentation.
  • Excellent collaboration skills and comfort leading across teams.
  • Experience mentoring or leading technical initiatives.
  • Communication skills for explaining complex AI concepts to non-technical stakeholders.

Responsibilities

  • Lead end-to-end observability design for AI/ML features in production and internal usage.
  • Instrument AI features in Tealium products for latency, accuracy, drift, usage, and cost.
  • Implement monitoring and cost tracking for third-party AI services.
  • Build telemetry pipelines to track LLM request/response metrics and prompt engineering observability.
  • Collaborate with data science and product teams to define and automate quality SLIs/SLOs for models.
  • Implement AI-aware tracing into the broader observability stack.
  • Participate in on-call rotations and help triage AI-specific incidents.
  • Automate validation pipelines to ensure AI features are robust across environments.
  • Establish dashboards and alerts for AI observability using various tools.
  • Contribute to ethical AI monitoring practices.

Benefits

  • Annual bonus and stock options.
  • Medical, dental, vision, life, and disability insurance.
  • 401k plan with company matching.
  • Flexible paid time-off and extended paid parental leave.
  • 11 paid holidays annually.
  • 15 hours of paid work time for volunteer activities.
  • Sick leave accrual with carryover options.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service