Machine Learning Engineer

AdobeSan Jose, CA
$102,400 - $202,250

About The Position

The Opportunity Adobe Journey Optimizer (AJO) powers personalized, real-time customer experiences at massive scale for global brands. Our Reliability Engineering & Operational Intelligence (REOI) team is building AJO's autonomous operating system — an AI-native platform that proactively improves product quality, accelerates issue resolution, and enhances customer experience through intelligent automation and continuous learning. We are seeking a Machine Learning Engineer who is eager to apply ML and AI to solve real challenges in reliability, quality, and operational intelligence at scale. In this role, you will build AI systems that make AJO progressively more reliable and self-healing — learning from every incident, preventing recurring failures, and ensuring exceptional customer experiences while enabling the platform to scale 4x without scaling operational overhead. This is a unique opportunity to work at the intersection of production systems, AI/ML, and product quality — where your work directly impacts how millions of customer journeys are delivered reliably every day.

Requirements

  • BS/MS in Computer Science, Machine Learning, Data Science, or related field, with 2-4 years of professional experience (or strong academic/internship experience in ML/AI applied to real-world problems)
  • Hands-on experience with Python and ML frameworks: scikit-learn, PyTorch, TensorFlow, HuggingFace, or LangChain
  • Practical knowledge of LLM APIs (OpenAI, Anthropic Claude, Azure OpenAI) and prompt engineering techniques for building agentic workflows
  • Understanding of vector databases and similarity search (FAISS, Pinecone, ChromaDB, MongoDB Atlas Vector Search, or similar)
  • Foundational knowledge of ML concepts: embeddings, clustering, classification, evaluation metrics (precision/recall/F1), and model deployment best practices
  • Comfortable building APIs and integrating ML models into backend services using FastAPI, Flask, or similar frameworks
  • Eagerness to learn production ML operations: model monitoring, A/B testing, continuous evaluation, and safety guardrails for AI systems
  • Strong problem-solving skills, attention to detail, and the ability to iterate quickly based on data and feedback
  • Excellent communication and collaboration — able to explain ML concepts to non-ML engineers and translate business requirements into technical solutions

Nice To Haves

  • Experience with Kubernetes, observability tools (Prometheus, Grafana, Datadog), incident management systems, or building AI agents for operational use cases

Responsibilities

  • Build AI-powered systems that improve the quality, reliability, and customer experience of AJO — by automating issue detection and resolution with human-in-the-loop approval, learning from operational patterns to prevent recurring failures, and providing real-time visibility into customer health and platform stability
  • Develop intelligent knowledge systems that compound expertise over time — using vector embeddings, similarity retrieval, and pattern clustering to ensure every incident investigation builds on past learnings, making the platform progressively smarter and more self-healing
  • Design and implement LLM-based workflows using prompt engineering, structured outputs, tool calling, and agentic reasoning patterns to create autonomous capabilities that operate safely at production scale
  • Build evaluation frameworks to measure AI system performance: quality improvement rates, automation success rates, mean time to resolution (MTTR) reduction, and customer impact metrics
  • Integrate AI capabilities with production infrastructure: Kubernetes, Prometheus, Splunk, GitHub, and 30+ operational data sources — creating closed-loop systems that detect, learn, and act autonomously
  • Apply ML techniques to operational data: anomaly detection for early issue detection, time-series forecasting for capacity planning, pattern clustering for recurring failure identification, and predictive analysis for proactive prevention
  • Collaborate with SREs, software engineers, and product teams to understand quality and reliability challenges, then design and deploy AI solutions that address them systematically
  • Contribute to code reviews, testing, documentation, and CI/CD pipelines — building production-grade ML systems with the same rigor as mission-critical infrastructure

Benefits

  • comprehensive benefits programs
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service