Software Engineer, Research Data Platform

Anthropic•San Francisco, CA

1d•Hybrid

About The Position

The Research Data Platform team builds the tools that Anthropic's researchers use every day to manage, query, and analyze the data that goes into training and evaluating frontier models. We power the internal applications researchers rely on to monitor RL runs, explore finetuning datasets, and understand what's happening inside their experiments. We're looking for engineers who love working directly with users and who excel at building data products — the pipelines that move data out of training runs into queryable storage, and the APIs, libraries, and services researchers use to manage and explore it. This role sits closer to the research workflow than a typical data infrastructure position: you'll often embed with research teams, build ML-specific tooling alongside them, and leverage what our Data Infrastructure team has already built rather than reinventing it. We do not require prior ML or AI training experience. If you enjoy working closely with technical users, learning new domains quickly, and building tools people actually want to use, you'll pick up the research context fast.

Requirements

Have significant software engineering experience, particularly building data-intensive applications or internal tooling
Enjoy working directly with users, gathering requirements iteratively, and shipping things that get adopted
Are results-oriented, with a bias towards flexibility and impact
Pick up slack, even if it goes outside your job description
Want to learn more about machine learning research
Care about the societal impacts of your work
Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience
Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience
Minimum years of experience: Years of experience required will correlate with the internal job level requirements for the position

Nice To Haves

Large-scale ETL, columnar storage formats, and query engines (e.g., Spark, BigQuery, DuckDB, Parquet)
High-volume time series data — ingestion, storage, and efficient querying
Data cataloging, lineage, or metadata management systems
ML experiment tracking or metrics platforms
Working in environments where engineers partner closely with quantitative users — research labs, trading firms, observability or analytics startups
Complex data visualization and full-stack web application development

Responsibilities

Build and operate data pipelines that extract data from research training runs and land it in storage systems that are easy and fast to query
Work closely with researchers to design and build APIs, libraries, and web interfaces that support data management, exploration, and analysis
Develop dataset management, data cataloging, and provenance tooling that researchers use in their day-to-day work
Embed with research teams to understand their workflows, identify high-leverage tooling opportunities, and ship solutions quickly
Collaborate with adjacent teams to build on existing systems rather than reinventing them