About The Position

NVIDIA is seeking a Senior Software Engineer to join our CSP Engagements team, focusing on system software for Datacenter products such as GB200. This role combines deep technical expertise in embedded firmware, Linux kernel development, and middleware development, with customer-facing responsibilities to enable cloud service providers with next-generation computing platforms. You will work at the intersection of hardware and software, driving technical solutions from concept through deployment.

Requirements

  • Deep expertise in data center server architectures, HPC systems, and hardware-software co-design.
  • Expert knowledge of Linux kernel internals, device drivers, communication protocols (PCIe, USB, Ethernet).
  • Deep understanding of computer architecture, microprocessor concepts, and expert knowledge of ARM (aarch64) and x86 architectures.
  • Deep understanding of NUMA architectures including memory topology, processor-memory locality, and performance optimization for multi-CPU systems in data center environments.
  • Strong programming skills in C/C++, Python, plus experience with virtualization, Kubernetes, and cloud-native architectures.
  • Skilled in complex system-level debugging, performance analysis, and test design.
  • BS or MS in Computer Engineering, Computer Science, or related field (or equivalent experience).
  • 8-12 years of system software development experience.

Nice To Haves

  • Experience with GPU computing (CUDA), deep learning workloads
  • Expertise in Out of Band and In-band management architectures
  • Knowledge of Memory fabric and CXL architectures

Responsibilities

  • Design and develop software solutions for data center servers including Linux kernel modifications, device drivers, and system optimizations for GB200 and next-gen platforms.
  • Lead hardware bring-up activities, BSP development, and hardware-software co-design for Cloud Service Provider deployments.
  • Partner directly with CSPs to deliver technical solutions, co-develop & co-debug features and optimizations, and provide support during new product introductions.
  • Collaborate with cross-functional teams in designing end-to-end solutions spanning firmware, OS, middleware, and applications with focus on AI/ML and HPC workloads.
  • Perform advanced system debugging, root cause analysis, and performance optimization for large-scale data center environments.
  • Collaborate with AE, FAE, and Solution Architect teams to deliver integrated customer solutions and technical documentation.

Benefits

  • You will also be eligible for equity and benefits .
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service