Machine Learning Engineer III

Workday•Pleasanton, CA

1d•$136,200 - $240,000•Hybrid

About The Position

This is an exciting opening in the AI Platform team, specifically within the Information Retrieval and Agent Evaluation team. The team is part of a global, high-growth technology company and has the opportunity to develop the next generation of Workday’s groundbreaking collaborative products, supporting a customer base of more than 31 million. The Agent Evaluation Platform project is central to Workday’s AI transformation, providing the critical infrastructure and algorithms to prove and improve AI agents as they are infused into Workday’s enterprise suite. The AI Platform Information Retrieval products are at the heart of Workday’s intelligence layer, bridging human language, search, and enterprise data, including reasoning over knowledge. These products utilize advanced semantic search to navigate Workday’s data model and translate natural language questions into precise SQL and Python executions. The AI Platform organization is bringing “AI first” products to life across the Workday product offering. We are looking for creative, results-focused, and skilled Machine Learning Engineers/scientists to work on a range of challenges. Workday offers unique advantages: access to exclusive, high-integrity enterprise datasets, the opportunity to work at the frontier of Agentic AI (validating, scaling, and optimizing agents, and extracting correct data for them), and the chance to have code empower the world’s largest companies to make data-driven decisions, reaching 31 million users. The culture is described as "people-first," balancing high-intensity innovation with sustainable work-life integration.

Requirements

3+ years of experience researching, developing and deploying production-grade ML systems, including expertise in deep learning, NLP, Information Retrieval, and recommender systems using frameworks like PyTorch or TensorFlow.
Proven track record of building and evaluating NLP and LLM-powered products, including expertise in RAG architectures, agentic frameworks (e.g., LangChain/LangGraph), and long-context LLM applications (e.g., Text-to-SQL).
2+ years of Python experience with a focus on modular library design, asynchronous patterns, and scalable system architecture (state management/error handling) for non-deterministic AI outputs.

Nice To Haves

Advanced degree (Master’s or Ph.D.) in a quantitative field or a strong portfolio of peer-reviewed research publications.
Proficiency in techniques like DSPy, Reinforcement Learning, imitation learning, graph neural networks, multi-modal models, and large-scale data processing (PySpark, SQL).
A "test-everything" mindset with experience in A/B testing, Knowledge Graphs, and "Golden Dataset" curation for model benchmarking.
Proficiency in large-scale data processing (PySpark, SQL).
Hands-on experience with the full ML lifecycle, including model fine-tuning (PEFT), evaluation frameworks (e.g., DeepEval/RAGAS), and cloud-native deployment (Docker/K8s, AWS/GCP).
Demonstrated ability to lead cross-functional teams, mentor junior engineers, and solve ambiguous problems with high autonomy.

Responsibilities

Architect Agentic AI: Design and deploy sophisticated reasoning, planning, and swarm agents that interact seamlessly with enterprise data and support continuous, life-long learning.
Drive Meta-ML & Optimization: Develop algorithms for automated node-level optimization within agent graphs, identifying the best LLM and prompt configurations for every workflow step. Build recommender systems for engineering teams to drive optimal evaluation for their agents.
Advance Information Retrieval: Build hybrid, agentic search systems and semantic parsing products (Text-to-SQL/Python) utilizing vector search, reasoning, and fine-tuning for structured output.
Scale Evaluation & Observability: Engineer cloud-based pipelines (Kubeflow) and A/B testing frameworks for rigorous offline/online evaluation, failure attribution, and safety monitoring.
Lead the ML Lifecycle: Own the end-to-end MLOps process—from exploration and prompt engineering to scalable production deployment—ensuring high-quality, reliable performance.
Define Strategic Roadmaps: Independently identify ML opportunities, propose high-impact solutions to leadership, and integrate industry best practices across the organization.
Collaborate with Autonomy: Work cross-functionally with PMs and Engineers to deliver "AI-first" products, enjoying full ownership of your work within a supportive, growth-oriented culture.