About The Position

We're looking for a Principal Software Engineer to join our CSP Engagements team as the technical focal point for GPU firmware and GPU system software, working directly with engineering teams of key CSP / hyperscale customers to ensure they can reliably manage, update, and operate NVIDIA GPU firmware at fleet scale. You will drive work streams with engineering teams of key CSPs/hyperscale customers to build shared understanding of GPU firmware and system software integration, incorporate their feedback into NVIDIA's feature roadmap and delivery plan, and ensure customer-side automation and recovery procedures are ready before each firmware release. Your cross-CSP visibility enables you to identify patterns in GPU firmware operational challenges that drive systemic improvements no single customer engagement could surface alone.

Requirements

  • 15+ years of experience in GPU system software, GPU firmware, or accelerator platform engineering.
  • BS or MS in Computer Science, Electrical Engineering, or related field (or equivalent experience)
  • Deep understanding of GPU architecture internals: streaming multiprocessors, GEMM execution, compute kernels, memory hierarchy, and how firmware/driver decisions impact GPU compute performance
  • Understanding of multi-GPU fabric architectures (NVLink, or similar) and how firmware coordinates across multiple GPUs in a rack-scale system
  • Understanding of GPU firmware architecture: VBIOS, GPU microcontroller firmware, InfoROM, and their interaction with the GPU driver stack
  • Experience with firmware update lifecycle management at scale: multi-device update sequencing, A/B updates, rollback, staged rollout, emergency recovery
  • Understanding of GPU error handling and recovery flows — how firmware-level errors propagate through the driver stack to application-visible failures
  • Experience with GPU health monitoring and telemetry: Xid errors, thermal events, power events, ECC counters, and their significance for firmware/software teams
  • Customer obsession — genuine passion for simplifying GPU firmware integration for fleet-scale customers.
  • Proven success influencing engineering teams to improve quality and fleet manageability

Nice To Haves

  • Direct experience with NVIDIA GPU VBIOS, GPU microcontroller firmware, or GPU driver internals
  • Background in GPU fleet management at 10K+ GPU scale — firmware rollout, health-based remediation, fleet-wide configuration management
  • Experience with GPU error taxonomy (Xid classification, NVLink error counters, ECC events) and building runbooks around GPU firmware behavior
  • Understanding of GPU security: secure boot chain, code signing, attestation, debug authentication, multi-tenancy isolation at the firmware level
  • Familiarity with GPU power management architecture and its impact on workload performance at fleet scale

Responsibilities

  • Drive GPU firmware & siftware work streams with CSP engineering teams — ensuring they understand GPU firmware architecture (VBIOS, InfoROM, microcontroller firmware), update sequencing, recovery procedures, and GPU power management
  • Gather and synthesize CSP feedback on GPU firmware/software — covering manageability, observability, security requirements (e.g., multi-tenancy isolation, secure boot, attestation), and performance — and champion those priorities into NVIDIA's GPU firmware/software feature roadmap and delivery plan
  • Drive GPU firmware update orchestration for large-scale deployments — multi-GPU update sequencing, rollback strategy, failure handling, and validation across hundreds of GPUs per rack
  • Serve as the technical focal point between NVIDIA and CSP firmware/software engineering — ensuring GPU behaviors (error recovery flows, thermal protection, power state transitions) are well-documented and accessible for customer integration
  • Identify cross-CSP GPU SW/FW issue patterns — common update failures, recovery gaps, and configuration problems — and drive documentation, tooling, and test strategy improvements

Benefits

  • equity
  • benefits
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service