Senior Software Engineer, Compute Architecture

CoreWeaveSunnyvale, NY
Hybrid

About The Position

As a Senior Software Engineer within our Compute Architecture organization, you will help build the software control plane for hardware lifecycle management across large-scale GPU data centers. The METALDEV team builds Go-based distributed services that bring infrastructure online, monitor production hardware health, automate safe operational workflows, and give operators the observability and control needed to manage GPU servers and rack-scale systems with reliability and confidence. This is a software-first role at the intersection of distributed systems, production reliability, and hardware-aware automation, ideal for engineers who want their code to operate real-world infrastructure at massive scale.

Requirements

  • 5+ years of experience building and operating infrastructure or backend systems.
  • Bachelor’s or Master’s degree in Computer Science or a related field, or equivalent practical experience.
  • Strong proficiency in Go for building production services and tools.
  • Experience designing and building gRPC and REST APIs.
  • Experience with Kubernetes and containerized workloads in production environments.
  • Familiarity with observability tooling such as Prometheus and Grafana.

Nice To Haves

  • Experience working with GPU-based systems.
  • Experience with low-level hardware management such as BMCs or Redfish.
  • Experience operating large-scale distributed systems or high-throughput infrastructure.
  • Experience collaborating with or contributing to open-source projects (for example, Go, Redfish).

Responsibilities

  • Design, build, and operate Go-based services that manage the lifecycle of large-scale GPU data center infrastructure.
  • Build automation for data center bring-up, hardware discovery, health monitoring, remediation, and production operations.
  • Develop reliable APIs, services, and workflows for managing BMCs, firmware state, server health, and rack-level infrastructure.
  • Improve observability, alerting, and operational tooling so production issues can be detected, understood, and resolved quickly.
  • Translate incidents and hardware failure modes into software improvements that make the platform more resilient.
  • Partner with hardware-adjacent, infrastructure, operations, and software teams to design systems that work safely at fleet scale.

Benefits

  • Medical, dental, and vision insurance - 100% paid for by CoreWeave
  • Company-paid Life Insurance
  • Voluntary supplemental life insurance
  • Short and long-term disability insurance
  • Flexible Spending Account
  • Health Savings Account
  • Tuition Reimbursement
  • Ability to Participate in Employee Stock Purchase Program (ESPP)
  • Mental Wellness Benefits through Spring Health
  • Family-Forming support provided by Carrot
  • Paid Parental Leave
  • Flexible, full-service childcare support with Kinside
  • 401(k) with a generous employer match
  • Flexible PTO
  • Catered lunch each day in our office and data center locations
  • A casual work environment
  • A work culture focused on innovative disruption
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service