Software DevOps Engineer

NVIDIASanta Clara, CA
1d

About The Position

NVIDIA is looking for an outstanding candidate to solve SW integration challenges for our next-generation data center platforms. You will be at the heart of our latest GPU architectures and advanced AI infrastructure projects, ensuring the seamless integration of world-class technologies in the areas of High-Speed Communication and virtualization. You will support products that leverage Ethernet and InfiniBand protocols, delivering a broad range of advanced compute and networking technologies for the world's most demanding AI workloads. In this role, you will provide first-tier support to R&D teams, acting as the bridge between pioneering hardware and stable software deployments.

Requirements

  • Bachelor Science Degree in Computer Science or similar academic degree, or equivalent experience.
  • Proven software engineering background with a deep understanding of standard methodologies in software development, modern Linux-based operating systems, and computer networking.
  • 5+ years of overall experience in DevOps, SRE, or Systems Integration roles.
  • Deep knowledge of Linux distributions (Ubuntu/RHEL) and containerization using Docker.
  • Coding skills in C/C++, Python, and Bash for automation and system-level fixes.
  • Experience with GitLab and GitLab CI for managing complex build pipelines.
  • Ability to multi-task, self-manage in a fast-paced environment, and lead technically during critical system failures.
  • Excellent problem-solving and critical thinking abilities.

Nice To Haves

  • In depth knowledge and familiarity with high-performance networking (InfiniBand, Ethernet).
  • Practical experience with gRPC, gNMI, REST, and JSON for system management and telemetry.
  • A proven track record of working on large-scale HW+SW converged systems (e.g., rack-scale computing or GPU clusters).

Responsibilities

  • Fixing and prioritising complex systems during high-stakes bringups and Proof of Concepts (PoCs) for next-generation computing architectures.
  • Managing the integration of large-scale products involving GPUs, complex Network Stacks, Firmware, and Drivers.
  • Creating, recreating, and redeploying software artifacts. You will be responsible for fixing code, updating builds, or providing creative workarounds to unblock development.
  • Serving as the primary technical point of contact for R&D teams to resolve immediate infrastructure and integration blockers.
  • Working closely with R&D, Verification, and DevOps teams to streamline the CI/CD pipeline for specialized high-speed interconnect and system management projects.

Benefits

  • With competitive salaries and a generous benefits package, we are widely considered to be one of the technology world’s most desirable employers.
  • We have some of the most forward-thinking and hardworking people in the world working for us and, due to unprecedented growth, our exclusive engineering teams are rapidly growing.
  • Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.
  • You will also be eligible for equity and benefits.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service