Infrastructure Systems Software Engineer

NVIDIAAustin, TX
1dHybrid

About The Position

As a Software Engineer in NVIDIA’s Internal Infrastructure Group, you’ll design and build distributed systems that power the workflows behind our next generation of GPUs and AI chips. The software you create will help thousands of engineers develop world-changing technology faster, more efficiently, and at scale. You’ll help scale the infrastructure that validates the world’s most advanced GPUs. What you'll be doing: Build and extend scalable, high-performance infrastructure services, platforms, and tools that improve reliability and developer productivity across NVIDIA’s chip-design ecosystem. Design and optimize distributed workflows that orchestrate millions of regression and validation workloads across heterogeneous compute clusters. Own systems end-to-end, from gathering requirements and proposing technical designs to implementation, performance analysis, testing, and deployment. Collaborate with internal teams to understand workflows, identify bottlenecks, and deliver automation that accelerates engineering workflows. Analyze and tune system performance across distributed services using profiling, tracing, and telemetry to help bring next-generation hardware and AI models to market faster.

Requirements

  • BS or MS in Computer Science or a related field (or equivalent experience).
  • 5+ years of professional software development experience.
  • Strong understanding of data structures, algorithms, concurrency, and system design.
  • Proficiency in modern programming languages (Python, C++, Go, or similar) on Linux systems, with experience building large-scale services, infrastructure tooling, or distributed systems.
  • Ability to reason about trade-offs between performance, reliability, and maintainability.

Nice To Haves

  • Experience developing or scaling distributed systems and internal developer tools.
  • Passion for improving engineering workflows and enabling others to move faster.
  • Hands-on familiarity with profiling, tracing, or performance-optimization techniques.
  • Understanding of chip-design, verification, or modern ML workflows.

Responsibilities

  • Build and extend scalable, high-performance infrastructure services, platforms, and tools that improve reliability and developer productivity across NVIDIA’s chip-design ecosystem.
  • Design and optimize distributed workflows that orchestrate millions of regression and validation workloads across heterogeneous compute clusters.
  • Own systems end-to-end, from gathering requirements and proposing technical designs to implementation, performance analysis, testing, and deployment.
  • Collaborate with internal teams to understand workflows, identify bottlenecks, and deliver automation that accelerates engineering workflows.
  • Analyze and tune system performance across distributed services using profiling, tracing, and telemetry to help bring next-generation hardware and AI models to market faster.

Benefits

  • Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.
  • The base salary range is 152,000 USD - 241,500 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.
  • You will also be eligible for equity and benefits.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service