About The Position

NVIDIA is seeking senior software engineers who are passionate about performance with an interest in optimizing self-driving solutions that run on NVIDIA’s multi-computer and heterogenous HW architectures. Our team builds NVIDIA’s end-to-end autonomous driving applications. This role involves developing, maintaining, and optimizing the latency and throughput of NVIDIA’s L2/L3/L4 autonomous driving solutions. You will devise acceleration strategies and patterns to improve software architecture and its efficiency on our computers with multiple heterogeneous hardware engines while meeting or exceeding product goals. The role also includes developing highly efficient product code in C++, making use of algorithmic parallelism offered by GPGPU programming (CUDA)/ARM NEON while following quality and safety standards such as defined by MISRA. Collaboration with HW, product, OS, and safety teams to design next-gen products is also a key aspect of this position.

Requirements

  • MS or PhD degree in Computer Science, Computer Architecture, Electrical Engineering or related field (or equivalent experience).
  • 12+ years of relevant professional experience working on autonomous vehicles software.
  • Excellent C and C++ programming skills.
  • Solid understanding of programming and debugging techniques, especially for parallel architectures.
  • Good understanding of system software/operating systems and computer architecture.
  • Experience with performance analysis, optimizations and benchmarking.
  • Outstanding communication and collaboration skills as this role might require significant interfacing with other teams within NVIDIA.

Nice To Haves

  • Understanding of embedded architectures and real-time operating systems & scheduling.
  • Strong mathematical fundamentals, including linear algebra and numerical methods.
  • Experience implementing algorithms in robotics, computer vision, and/or machine learning.
  • Software development experience with CUDA/GPGPU or any data parallel architectures.
  • Deep learning architecture/performance work on any HW accelerator, especially if on GPUs.

Responsibilities

  • Develop, maintain and optimize latency and throughput of NVIDIA’s L2/L3/L4 autonomous driving solutions.
  • Devise acceleration strategies and patterns to improve software architecture and its efficiency on our computers with multiple heterogeneous hardware engines while meeting or exceeding product goals.
  • Develop highly efficient product code in C++, making use of algorithmic parallelism offered by GPGPU programming (CUDA)/ARM NEON while following quality and safety standards such as defined by MISRA.
  • Collaborate with HW, product, OS, and safety teams to design next-gen products.

Benefits

  • equity
  • benefits
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service