About The Position

NVIDIA’s invention of the GPU in 1999 fueled the growth of PC gaming, redefined modern computer graphics, and revolutionized parallel computing. GPU deep learning has since ignited a new chapter in computing, powering AI systems that can perceive and interpret the world. Today, NVIDIA is recognized as the AI computing company, and we’re continuing to expand our teams with outstanding talent. NVIDIA DGX, HGX, and MGX servers' power the globe’s foremost enterprise AI infrastructure. In the role of a Senior System Software Engineer for Datacenter Software and Firmware Release Lifecycle Management, you will develop the infrastructure using CI/CD automation pipelines. These pipelines enable workflows for assembling, validating and delivering of NVIDIA’s firmware and software across all NVIDIA's GPU-based servers. You will partner closely with various Software and Firmware development, QA, and diagnostics teams to ensure reliable, efficient, and high-quality releases.

Requirements

  • BS or MS in Computer Science, Computer Engineering, or a related field (or equivalent experience) with 5+ years of relevant software and firmware release management experience.
  • Strong programming skills in Python.
  • Hands-on experience using CI/CD tools like Jenkins, GitLab CI, or similar.
  • Familiarity with containerization and orchestration (Docker, Kubernetes).
  • Proven understanding of source control (Git) and agile development workflows
  • Passion for automation, scalable software, and continuous learning
  • Strong problem-solving skills and attention to software quality and maintainability.

Nice To Haves

  • Proven record of doing Software and Firmware Release engineering for multiple customers
  • Defect triaging and defect management experience of system software and firmware releases.
  • Having released for x86_64 and arm64 architectures
  • Prior experience of having worked on Datacenter Products, good knowledge of system software and Linux environments.
  • Knowledge of web and backend frameworks (React, Angular, Flask, Spring, or similar) and artifact management tools (JFrog, Nexus)

Responsibilities

  • Collaborate with firmware, hardware, software, and QA teams to gather requirements and help deliver quality firmware and software release solutions.
  • Support the ingestion and packaging of software and firmware binaries to prepare them for deployment across various platforms
  • Build, develop and maintain tools and infrastructure for software and firmware release lifecycle.
  • Implement automation pipelines and assist with system tools like Jenkins, Docker, and Kubernetes. Pipelines should build, test and deploy software and firmware artifacts.
  • Documentation of release notes and effective communication to internal and external collaborators
  • Troubleshoot and debug software and firmware packaging and deployment processes.
  • Providing tested releases of software and firmware to partners.
  • Document processes and contribute to team discussions to improve firmware release workflows.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service