Machine Learning Eval Engineer

Reducto•San Francisco, CA

92d•Onsite

About The Position

Reducto helps AI teams ingest real-world enterprise data with state-of-the-art accuracy. Most enterprise data, from financial statements to health records, is locked in unstructured file formats like PDFs and spreadsheets. We train vision models to read those documents the way a human would, enabling teams to build products, train models, and automate processes at scale. We’ve grown rapidly, increasing revenue 7x year over year and partnering with hundreds of companies, from leading AI teams like Harvey, Vanta, and Scale, to enterprise customers across FAANG and top trading firms. Reducto has raised over $100M from world-class investors including a16z, Benchmark, and First Round Capital. As an ML Eval Engineer, you’ll play a key role in building the evaluation systems and benchmarks that make Reducto’s models better over time. You’ll collaborate closely with our ML, platform, and GTM teams to identify model weaknesses, design strong benchmarks, and create metrics and tooling that surface new failure modes as we scale. This is a high-impact role where you’ll help define how model quality is measured at Reducto and shape the systems we use to improve it.

Requirements

Hold yourself to a high bar for quality and precision.
Enjoy solving complex problems and building from first principles.
Have strong Python skills and can independently build clean, reliable technical solutions.
Are comfortable working with data infrastructure such as AWS S3 and OLAP or analytics systems like Tinybird.
Love getting your hands dirty with unstructured data and chasing down difficult failure cases.
Operate well in fast-changing, high-growth environments.
Collaborate effectively across technical and non-technical teams.
Take full ownership from strategy through execution.

Nice To Haves

Have experience at an early-stage or high-growth startup.
Have some background in product thinking and can build simple, polished user-facing interfaces.
Are comfortable working directly with customers to understand their workflows and data needs.
Have experience in AI/ML, data infrastructure, enterprise software, or document understanding systems.
Care deeply about combining technical excellence with business impact.

Responsibilities

Design, build, and maintain evaluation benchmarks that reveal where our models perform well and where they fail.
Develop metrics, heuristics, and workflows to automatically identify new failure modes across large and messy real-world datasets.
Partner closely with other ML engineers to turn evaluation insights into model improvements and better training priorities.
Work hands-on with unstructured enterprise data, including PDFs, spreadsheets, and other difficult document formats, to uncover edge cases and hard examples.
Build lightweight internal and user-facing tools, including simple interfaces in Python frameworks like Flask, to help teams inspect results, analyze model behavior, and communicate evaluation outcomes.
Collaborate with customers and internal teams to understand real-world data needs and create bespoke benchmarks that highlight Reducto’s strengths.

Benefits

Unlimited PTO: We believe great work requires recharging.
Lunch: Receive a free lunch to eat with your teammates daily at the office
Reimbursed Transportation: Provide us with your receipts and we’ll take care of the costs
Insurance: Generous health insurance covering medical, dental, and vision.
Health and Wellness Budget: We provide up to $150/mo reimbursement for health and wellness spending, such as gym memberships, fitness classes, or similar.
Parental Leave: Work with us to build a leave schedule that works for you and your family

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume