About The Position

Key Responsibilities AI/ML Model Operations Deploy, manage, and monitor machine learning and AI models in production environments. Implement model performance monitoring including accuracy, latency, and inference metrics. Detect and mitigate concept drift, data drift, and model degradation. AI Observability Design and implement AI observability frameworks to track model behavior and reliability. Monitor LLM outputs, hallucination rates, and response quality. Implement logging, tracing, and evaluation pipelines for AI systems. Agentic Systems Monitoring Monitor agent-based AI workflows and autonomous systems. Track agent actions, tool usage, decision paths, and execution outcomes. Implement guardrails, safety monitoring, and failure detection for AI agents. Data Pipeline Monitoring Monitor and maintain data ingestion, transformation, and feature pipelines. Ensure data quality, schema consistency, and pipeline reliability. Detect and resolve pipeline failures and anomalies. Infrastructure & Automation Build and maintain CI/CD pipelines for ML models and AI systems. Manage model versioning, experiment tracking, and reproducibility. Automate monitoring alerts, incident response, and remediation. Collaboration Work closely with data scientists, ML engineers, platform teams, and product teams. Support continuous improvement of AI system reliability and governance

Responsibilities

  • Deploy, manage, and monitor machine learning and AI models in production environments.
  • Implement model performance monitoring including accuracy, latency, and inference metrics.
  • Detect and mitigate concept drift, data drift, and model degradation.
  • Design and implement AI observability frameworks to track model behavior and reliability.
  • Monitor LLM outputs, hallucination rates, and response quality.
  • Implement logging, tracing, and evaluation pipelines for AI systems.
  • Monitor agent-based AI workflows and autonomous systems.
  • Track agent actions, tool usage, decision paths, and execution outcomes.
  • Implement guardrails, safety monitoring, and failure detection for AI agents.
  • Monitor and maintain data ingestion, transformation, and feature pipelines.
  • Ensure data quality, schema consistency, and pipeline reliability.
  • Detect and resolve pipeline failures and anomalies.
  • Build and maintain CI/CD pipelines for ML models and AI systems.
  • Manage model versioning, experiment tracking, and reproducibility.
  • Automate monitoring alerts, incident response, and remediation.
  • Work closely with data scientists, ML engineers, platform teams, and product teams.
  • Support continuous improvement of AI system reliability and governance

Benefits

  • Medical, vision, and dental benefits
  • 401k retirement plan
  • variable pay/incentives
  • paid time off
  • paid holidays

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

1,001-5,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service