CoreWeave is seeking a Senior Software Engineer II for the Applied Training team. This role focuses on solving the problem of customers spending valuable research time on cluster setup and operations instead of AI model training. The Applied Training team aims to provide every CoreWeave customer with research infrastructure comparable to that found in frontier labs. As an early member of this small team, you will be responsible for our Kubernetes-native research cluster platform, the sandbox client for agentic training and evaluation, or potentially a new project. The goal is to give customers the research infrastructure they need to succeed. The role involves contributing to the team's roadmap, working closely with customers and internal teams building cloud-native primitives. Specific responsibilities may include designing and building a complete research cluster experience (CLI, job configuration schema, Kubernetes operators, daemons) to address researcher challenges like code distribution, checkpoint-triggered evaluation, cross-cluster scheduling, and programmatic job control. Alternatively, you might own the Python SDK for sandbox infrastructure, enabling large-scale RL training runs with isolated containers. You will also write documentation for running popular OSS training frameworks on CoreWeave and collaborate with infrastructure teams and customers (large AI labs with thousands of GPUs) to understand and integrate their supercomputing stack knowledge into our offerings.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
No Education Listed