Machine Learning Engineer III, Core Agents

Box•Redwood City, CA

10h•$175,500 - $219,500•Hybrid

About The Position

AI is transforming how enterprises work, and Box is building an enterprise-grade Agents Platform at the core of the Box Content Cloud. Our platform, built on LangGraph, enables teams across Box and our customers to design, deploy, and operate AI agents that handle real-world enterprise workflows—from content understanding and generation to intelligent metadata, automation, and complex, multi-step orchestrations. As a founding ML Engineer on the Core Agents team, you will build and evaluate the foundational agents that power the Box AI ecosystem, including DeepSearch, DeepResearch, Extract, and Compose. You’ll design techniques for intent detection, ranking, evaluation, retrieval-augmented generation (RAG), and multi-agent orchestration, while also establishing metrics and evaluation frameworks to measure agent quality. Your work will shape how agents retrieve, reason, and act on enterprise content with high accuracy and trustworthiness. You’ll collaborate closely with platform engineers to build the core components of the Agents Platform that enable these agents to run at scale, while also empowering other Box teams and customers to configure and customize agents for their workflows.

Requirements

3+ years of industry experience building or evaluating ML-powered systems.
Strong background in machine learning, information retrieval, or natural language processing.
Proficiency with at least one programming language such as Python, Java, or Scala.
Experience designing, training, and evaluating ML models in production.
Familiarity with retrieval systems, ranking models, RAG pipelines, or intent classification.
BS degree in Computer Science, Machine Learning, or a related field.

Nice To Haves

Advanced degree in computer science, machine learning, or related field.
Hands-on experience with LangChain, LangGraph, or other agent frameworks.
Familiarity with LLMs, embeddings, semantic search, indexing, and relevance optimization.
Experience with cloud-based ML platforms such as Vertex AI, AWS Bedrock, or SageMaker.
Experience with Kubernetes-based systems for deploying and scaling ML workloads.
Research or applied experience in evaluation of generative AI systems (factuality, safety, grounding).

Responsibilities

Build, evaluate, and evolve foundational agents such as DeepSearch, DeepResearch, Extract, and Compose.
Develop techniques for intent detection, query understanding, ranking, and RAG to improve accuracy and relevance.
Define metrics, evaluation pipelines, and benchmarks for agent quality, including precision/recall, factual grounding, and latency trade-offs.
Research and implement best practices in retrieval, orchestration, and evaluation of multi-agent workflows.
Collaborate with platform engineers to design core components that enable secure, reliable, and scalable deployment of agents.
Partner with product teams to translate enterprise use cases into agentic solutions, ensuring measurable improvements in user experience.
Contribute to technical discussions, share research insights, and help define the roadmap for Box’s agent ecosystem.