This role focuses on optimizing the foundation model training stack to reduce wall-clock time to convergence. The position involves designing, building, and optimizing distributed training systems, implementing low-level code for performance, and optimizing workloads for hardware efficiency. The role also includes developing tools for monitoring and debugging large-scale runs.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
No Education Listed