Key Responsibilities AI/ML Model Operations Deploy, manage, and monitor machine learning and AI models in production environments. Implement model performance monitoring including accuracy, latency, and inference metrics. Detect and mitigate concept drift, data drift, and model degradation. AI Observability Design and implement AI observability frameworks to track model behavior and reliability. Monitor LLM outputs, hallucination rates, and response quality. Implement logging, tracing, and evaluation pipelines for AI systems. Agentic Systems Monitoring Monitor agent-based AI workflows and autonomous systems. Track agent actions, tool usage, decision paths, and execution outcomes. Implement guardrails, safety monitoring, and failure detection for AI agents. Data Pipeline Monitoring Monitor and maintain data ingestion, transformation, and feature pipelines. Ensure data quality, schema consistency, and pipeline reliability. Detect and resolve pipeline failures and anomalies. Infrastructure & Automation Build and maintain CI/CD pipelines for ML models and AI systems. Manage model versioning, experiment tracking, and reproducibility. Automate monitoring alerts, incident response, and remediation. Collaboration Work closely with data scientists, ML engineers, platform teams, and product teams. Support continuous improvement of AI system reliability and governance
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed
Number of Employees
1,001-5,000 employees