We are looking for a Machine Learning Data Engineer to join our Applied Science Data Frameworks team responsible for building the foundational infrastructure that powers large-scale multimodal AI training and inference. This role is ideal for someone with strong distributed systems and data engineering fundamentals who is eager to work in an ML-adjacent environment—contributing to training data loaders, distributed inference frameworks, feature enrichment pipelines, and dataset management systems that enable ML teams to train foundation models at petabyte scale. You'll work on high-impact projects involving distributed data loading for PyTorch training workloads, batch inference pipelines for feature enrichment, semantic search infrastructure for dataset discovery, and production-grade ML data pipelines that support generative AI model development. Your systems will process billions of images, videos, and multimodal content across large-scale GPU clusters. If you're excited about building distributed data frameworks, optimizing data pipelines at scale, and growing your expertise in ML infrastructure, we'd love to hear from you.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level