About The Position

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world. We are seeking a Senior Systems Software Engineer to join our advanced infrastructure software team. In this role, you will be responsible for designing, developing, and maintaining high-performance, rack-scale management solutions for datacenter environments. You will work primarily in Rust, Go, and C++, building robust, scalable systems that bridge hardware, firmware, and cloud-native services.

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or a related field (or equivalent experience).
  • 5+ years of experience in systems software engineering with a focus on distributed systems, software/firmware development, or infrastructure automation.
  • Strong hands-on experience with Rust, Go, and C++ for systems-level development.
  • Datacenter or computer architecture experience is required—you should understand server, rack, and network topologies, as well as hardware/firmware/software interactions.
  • Experience with hardware management protocols (Redfish, IPMI, BMC) and firmware update automation.

Nice To Haves

  • Experience with rack-scale or data center management platforms.
  • Familiarity with test automation, simulation/mocking frameworks, and CI/CD pipelines.
  • Knowledge of hardware validation, health monitoring, and diagnostics (DCGM, nvbandwidth, Field Diag).
  • Contributions to open-source infrastructure or systems software projects.

Responsibilities

  • Systems Software Development: Architect, implement, and maintain core components of an internally developed IaaS (Infrastructure-as-a-Service) product and related microservices primarily in Rust, C++, or Go.
  • Hardware/Firmware Integration: Develop and automate workflows for device discovery, firmware updates, and health monitoring using protocols such as Redfish and other BMC interfaces.
  • Distributed Systems: Build and extend distributed microservices and gRPC APIs for rack management, supporting multi-rack, multi-tenant, and multi-site deployments.
  • Telemetry & Health Monitoring: Implement telemetry collection, aggregation, and analysis pipelines using Prometheus, OpenTelemetry, and Grafana; contribute to Health-as-a-Service initiatives.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service