About The Position

NVIDIA is seeking an experienced Senior Manager to lead the release of software and firmware for their Datacenter Software Tools team. This role is crucial for ensuring the quality and reliability of NVIDIA's DGX, HGX, and MGX servers, which are at the forefront of enterprise AI infrastructure. The position involves defining release scopes, developing end-to-end infrastructure and workflows, and collaborating with cross-functional teams to deliver high-quality, scalable, resilient, and secure release solutions. The ideal candidate will influence architecture and design decisions, partner with development, QA, and product engineering teams to "left-shift" release quality, and enforce quality metrics and KPIs. This role also includes owning the ingestion and packaging of software/firmware binaries, documenting procedures, and driving innovation in release processes, automation, and AI-assisted validation.

Requirements

  • 12+ overall years in the software industry with specialization in system software and/or firmware development.
  • 5+ years of proven technical hands-on leadership for multi-team organizations across data center firmware like BMC, FPGA, CPLDs, network switches, building Infrastructure for continue improvement for quality of releases.
  • BS/MS/PhD in CS, CE, EE, or a related technical field — or equivalent experience
  • Prior experience in systems software or firmware development with a proven history of guiding complex software features or products throughout the entire product life cycle. Ideally, on rack-scale datacenter products.
  • Strong understanding of computer system architecture, operating systems principles, HW-SW interactions, and performance analysis/optimizations.
  • Working fluency in Python and Linux sufficient to review designs, prototype tooling, and debug production issues alongside the team.
  • Hands-on experience with web application frameworks and CI/CD platforms (Jenkins, GitLab, Artifactory).
  • Track record of balancing multiple projects with competing priorities and delivering against measurable benchmarks (MTTR, specification compliance, release cadence, automation coverage).
  • Excellent communication and collaboration skills across teams and time zones.

Nice To Haves

  • Familiarity with the architecture of datacenter server software and experience with the in-band and out-of-band management of firmware and hardware components.
  • Understanding REST architecture style especially JSON over HTTPs with OAuth and DMTF / PLDM / SPDM firmware management protocols.
  • Proven experience in developing a self-service release infrastructure, resulting in clear reductions in onboarding SLA times.
  • Experience integrating AI/LLM tooling into engineering workflows – for triage, test generation, code review, or release validation.
  • Experience leading engineering teams with geographically distributed teams across US and APAC.

Responsibilities

  • Bring leadership on how releases should be delivered to end customers of rack-scale computing based on tightly coupled compute and switch trays.
  • Build end-to-end infra and workflows to ensure the highest quality releases for data center firmware and software.
  • Define release scope for rack scale products working cross functionally with product management, technical architects and program management.
  • Deliver releases that flow through the validation matrix for customer end use cases, ensuring delivered firmware and software is of the highest quality.
  • Influence architecture, design and implementation decisions for compute and switch trays software and firmware - ensuring quality across nightly, dev and production drops for all customer use cases, with the right release-validation strategy at each phase of development life cycle.
  • Partner with all matrixed organizations: Developers, SWQA, Product engineering to left-shift release quality from dev to QA in a very fast-moving environment with end-to-end CI/CD to ensure no bug is found at customer site.
  • Enforce it with well-placed quality metrics for any product milestone and track KPIs published at regular cadence that are enforced.
  • Monitor and report progress of releases to all stakeholders.
  • Own ingestion and packaging of software and firmware binaries, readying them for deployment across multiple platforms at scale across different CSP environments.
  • Document procedures and engage in collaborative discussions to refine software and firmware release workflows, including identifying and resolving issues in release milestone packaging and deployment procedures and remove bottlenecks.
  • Shape the team's roadmap and drive innovation — including self-service interfaces, automation, AI-assisted validation and triage, and sophisticated release-compliance reporting.
  • Continuously review and identify improvement opportunities in established release processes, infrastructure, and practices.
  • Ensure the teams are performing in the most efficient and transparent way with a strong focus on automation and measurable targets.

Benefits

  • equity
  • benefits

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Number of Employees

5,001-10,000 employees

© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service