Senior Systems Software Engineer

NVIDIABoulder, CO

About The Position

It’s an exciting time to join the NVIDIA Cloud Native Engineering (NVCNE) group’s backend software team. As a Cloud Platform Software Engineer, you will work closely with architects, designers, frontend engineers, SREs, and other technical leaders to build a software platform that powers the lifecycle of AI supercomputing infrastructure on Kubernetes. Together, we will enable scalable and resilient AI services across the cloud. You will design and implement software aligned with the architectural vision for the NVIDIA Cloud Platform, contributing to core features and capabilities. You will own your work end‑to‑end—from development to test, deployment, and production support. This role includes partnering with SRE and product teams to troubleshoot complex distributed systems and drive operational excellence. You are expected to follow and extend NVIDIA’s Cloud Native development practices, with a strong focus on Kubernetes. This position offers the opportunity to shape world‑class AI infrastructure and make a tangible impact at global scale. NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars. NVIDIA is looking for great people like you to help us accelerate the next wave of artificial intelligence. NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people on the planet working for us. If you're a creative, curious, and driven technical leader, we want to hear from you! NVIDIA is the world leader in accelerated computing. NVIDIA pioneered accelerated computing to tackle challenges no one else can solve. Our work in AI and digital twins is transforming the world's largest industries and profoundly impacting society. Learn more about NVIDIA.

Requirements

  • BS in Computer Science, Information Systems, Computer Engineering or equivalent experience.
  • 8+ years of professional experience.
  • 3–5 years of hands-on experience in large-scale software development using modern languages and frameworks.
  • Strong proficiency in Golang for developing Kubernetes operators, controllers, and custom tools.
  • Proven experience building, deploying, and scaling services on Kubernetes, including work with CRDs and auto-scaling infrastructure.
  • Expertise with cloud-native infrastructure and managed Kubernetes services across AWS, GCP, Azure, and OCI.
  • Demonstrated ability to collaborate with cross-functional teams to deliver performant, reliable cloud services at scale.
  • Experienced in participating in incident response, performing root cause analysis, and implementing preventive measures to improve reliability.
  • Excellent communication and troubleshooting skills across infrastructure, Kubernetes, and application runtime layers, with the ability to articulate design decisions and quality strategies clearly.

Nice To Haves

  • Hands-on experience with Kubernetes Cluster API, Terraform, CSP APIs, and related infrastructure automation tooling.
  • Proficiency with Kustomize or other Kubernetes packaging and deployment tools, with the ability to refactor software for containerized and orchestrated environments.
  • Familiarity with CNI, CSI, and CRI interfaces and the broader CNCF ecosystem and its evolving tooling.
  • Demonstrated background in open-source software, including active participation or upstream contributions to community projects.
  • Strong understanding of modern infrastructure design and deployment patterns across cloud-native and hybrid environments.

Responsibilities

  • Develop software systems to support large scale deployments of cloud infrastructure
  • Design, develop and distribute APIs to support Infrastructure as Code (IaC) automation and deployment workflows.
  • Responsible for contributing to multiple source code projects to fulfill NVIDIA requirements with software services
  • Work and collaborate with engineering managers, architects, designers, and frontend engineers to deliver high quality software
  • Automate the validation of software solutions with unit and integration tests
  • Innovate with other engineers on proposed designs and product direction
  • Openly share successes and failures in a no blame environment

Benefits

  • equity and benefits
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service