Staff Software Engineer, Infrastructure- Weights & Biases

Weights & BiasesSunnyvale, CA
4d$188,000 - $275,000Hybrid

About The Position

CoreWeave, the AI Hyperscaler™, acquired Weights & Biases to create the most powerful end-to-end platform to develop, deploy, and iterate AI faster. Since 2017, CoreWeave has operated a growing footprint of data centers covering every region of the US and across Europe, and was ranked as one of the TIME100 most influential companies of 2024. By bringing together CoreWeave’s industry-leading cloud infrastructure with the best-in-class tools AI practitioners know and love from Weights & Biases, we’re setting a new standard for how AI is built, trained, and scaled. The integration of our teams and technologies is accelerating our shared mission: to empower developers with the tools and infrastructure they need to push the boundaries of what AI can do. From experiment tracking and model optimization to high-performance training clusters, agent building, and inference at scale, we’re combining forces to serve the full AI lifecycle — all in one seamless platform. Weights & Biases has long been trusted by over 1,500 organizations — including AstraZeneca, Canva, Cohere, OpenAI, Meta, Snowflake, Square,Toyota, and Wayve — to build better models, AI agents and applications. Now, as part of CoreWeave, that impact is amplified across a broader ecosystem of AI innovators, researchers, and enterprises. As we unite under one vision, we’re looking for bold thinkers and agile builders who are excited to shape the future of AI alongside us. If you're passionate about solving complex problems at the intersection of software, hardware, and AI, there's never been a more exciting time to join our team. At marimo, we’re on a mission to make the world’s best open-source programming environment for working with data, together with a world-class, scalable, and performant cloud-hosted counterpart. marimo is a reinvention of the Python notebook — it feels like a next-gen reactive Python notebook, but is stored as pure Python that can be versioned with Git, deployed as a data app, run as a script, and reused as a module. Incubated initially with scientists at Stanford, marimo is now used by leading companies, labs, and universities around the world, with millions of downloads since launch and over 17,000 stars on our GitHub repository. On the marimo team, we believe that the tools we use shape the way we think — better tools, for better minds. If you're passionate about developer tools, thrive in ambiguity, enjoy solving challenging engineering problems at scale, and yearn for greenfield problems, you'll fit right in. The molab team develops a cloud-hosted marimo notebook service called molab. molab lets anyone in the world experiment with data, build interactive apps, and share their work for free, all using the open-source marimo notebook and accelerated by AI-assisted coding. This is a high impact team and product, undergoing rapid growth and working on many challenging greenfield problems — with an emphasis on building highly available, low latency and fault tolerant systems. About the role: As a staff engineer on marimo's core open source team, you will co-design and implement the backend architecture of molab, solving for high availability, low latency (both the ability to rapidly spin up and spin down notebook kernels on demand, as well as low latency communication between the notebook frontend and backend), stability, and fraud and abuse. You will design molab to run on CoreWeave's specialized kubernetes-based clusters and integrate with CoreWeave object storage, and will solve for keeping utilization of GPUs high.

Requirements

  • 8+ years of experience in software engineering
  • Strong fundamentals that are language agnostic
  • Expertise in computer systems, including parallel computing (threading, multiprocessing), concurrency (asynchronous programming), networking/inter-process communication
  • Experience with containerization, container orchestration (kubernetes), scheduling, networked filesystems, resource allocation, distributed systems, and cloud infrastructure
  • Experience building highly available, fault-tolerant systems
  • Strong communication skills, written and verbal†

Nice To Haves

  • Proficiency with Python and Python packaging
  • Basic experience with or awareness of the Python stack for AI/ML
  • Empathy for practitioners and researchers in AI, ML, data engineering, NLP, or other quantitative work
  • Experience with GPU resource allocation and sharing

Benefits

  • Medical, dental, and vision insurance - 100% paid for by CoreWeave
  • Company-paid Life Insurance
  • Voluntary supplemental life insurance
  • Short and long-term disability insurance
  • Flexible Spending Account
  • Health Savings Account
  • Tuition Reimbursement
  • Ability to Participate in Employee Stock Purchase Program (ESPP)
  • Mental Wellness Benefits through Spring Health
  • Family-Forming support provided by Carrot
  • Paid Parental Leave
  • Flexible, full-service childcare support with Kinside
  • 401(k) with a generous employer match
  • Flexible PTO
  • Catered lunch each day in our office and data center locations
  • A casual work environment
  • A work culture focused on innovative disruption
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service