Data is playing an increasingly crucial role at the frontier of AI innovation. Many of the most meaningful advances in recent years have come not from new architectures, but from better data. As a member of the Data Team, your mission is to build and operate the ingestion systems that turn the open web and other large-scale data sources into reliable, well-structured corpora for training frontier models. You will own the machinery that acquires, extracts, normalizes, versions, and delivers data to our pre-training pipelines. You’ll work directly with world-class researchers to close the loop between what we collect and how it impacts model performance. This role is ideal for engineers who love building robust distributed systems, but who also want to run experiments, reason about tradeoffs in data acquisition, and iterate quickly based on measurable impact. Working closely with our pre-training and data quality teams, you will:
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed
Number of Employees
51-100 employees