Machine Learning Engineer III

Workday•Pleasanton, CA

53d•Hybrid

About The Position

This is an exciting opening in the AI Platform team, specifically within the Information Retrieval and Agent Evaluation team. The team is part of a global, high-growth technology company and has the opportunity to develop the next generation of Workday’s groundbreaking collaborative products, supporting a customer base of more than 31 million. The Agent Evaluation Platform project is the "Ground Truth" engine for Workday’s AI transformation. As Workday infuses AI Agents into every facet of its enterprise suite, this team provides the critical infrastructure and algorithms needed to prove they work and make them better. They build the platform that enables agent engineering teams to be empowered with rigorous, data-driven optimization, evaluation, and validation of their agents. The AI Platform Information Retrieval products are at the heart of Workday’s intelligence layer, bridging the gap between human language, search, and enterprise data, including reasoning over knowledge. These products utilize advanced semantic search to navigate Workday’s massive data model, as well as turning natural language questions into precise SQL and Python executions. The AI Platform organization is bringing “AI first” products to life at every step of the Workday product offering. They are looking for highly creative, results-focused, and deeply skilled Machine Learning Engineers/scientists to work on a range of these challenges.

Requirements

3+ years of experience researching, developing and deploying production-grade ML systems, including expertise in deep learning, NLP, Information Retrieval, and recommender systems using frameworks like PyTorch or TensorFlow.
Proven track record of building and evaluating NLP and LLM-powered products, including expertise in RAG architectures, agentic frameworks (e.g., LangChain/LangGraph), and long-context LLM applications (e.g., Text-to-SQL).
2+ years of Python experience with a focus on modular library design, asynchronous patterns, and scalable system architecture (state management/error handling) for non-deterministic AI outputs.

Nice To Haves

Advanced degree (Master’s or Ph.D.) in a quantitative field or a strong portfolio of peer-reviewed research publications.
Proficiency in techniques like DSPy, Reinforcement Learning, imitation learning, graph neural networks, multi-modal models, and large-scale data processing (PySpark, SQL).
A "test-everything" mindset with experience in A/B testing, Knowledge Graphs, and "Golden Dataset" curation for model benchmarking.
Proficiency in large-scale data processing (PySpark, SQL).
Hands-on experience with the full ML lifecycle, including model fine-tuning (PEFT), evaluation frameworks (e.g., DeepEval/RAGAS), and cloud-native deployment (Docker/K8s, AWS/GCP).
Demonstrated ability to lead cross-functional teams, mentor junior engineers, and solve ambiguous problems with high autonomy.

Responsibilities

Architect Agentic AI: Design and deploy sophisticated reasoning, planning, and swarm agents that interact seamlessly with enterprise data and support continuous, life-long learning.
Drive Meta-ML & Optimization: Develop algorithms for automated node-level optimization within agent graphs, identifying the best LLM and prompt configurations for every workflow step. Build recommender systems for engineering teams to drive optimal evaluation for their agents.
Advance Information Retrieval: Build hybrid, agentic search systems and semantic parsing products (Text-to-SQL/Python) utilizing vector search, reasoning, and fine-tuning for structured output.
Scale Evaluation & Observability: Engineer cloud-based pipelines (Kubeflow) and A/B testing frameworks for rigorous offline/online evaluation, failure attribution, and safety monitoring.
Lead the ML Lifecycle: Own the end-to-end MLOps process—from exploration and prompt engineering to scalable production deployment—ensuring high-quality, reliable performance.
Define Strategic Roadmaps: Independently identify ML opportunities, propose high-impact solutions to leadership, and integrate industry best practices across the organization.
Collaborate with Autonomy: Work cross-functionally with PMs and Engineers to deliver "AI-first" products, enjoying full ownership of your work within a supportive, growth-oriented culture.