Together.ai is a leader in developing AI infrastructure that powers the training of state-of-the-art models. We focus on creating scalable, efficient systems for handling massive datasets and managing large-scale distributed checkpoints, ensuring seamless workflows for training and fine-tuning AI models. We are seeking a Training Dataset and Checkpoint Acceleration Engineer to optimize data pipelines and checkpoint mechanisms for large-scale machine learning workloads. In this role, you will work at the intersection of data engineering and distributed systems, ensuring that training workflows are highly performant, reliable, and cost-efficient.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed
Number of Employees
101-250 employees