Senior AI-Native Systems Software Engineer, TensorRT

NVIDIA•Santa Clara, CA

52d•Hybrid

About The Position

Join NVIDIA’s TensorRT team to lead a first-of-its-kind, AI-native initiative, aiming to make TensorRT the default entry point for out-of-framework inference globally. This role involves moving beyond traditional development cycles by building a new framework from the ground up, leveraging swarms of AI agents to produce high-performance, high-quality, modern C++ software at an unprecedented scale. The ideal candidate is a systems-thinking C++ engineer eager to help scale an agentic development framework, stay on top of state-of-the-art deep learning breakthroughs, and improve users’ experience with lightning-fast model onboarding.

Requirements

BS, MS, or PhD in Computer Science, Computer Engineering, AI, or equivalent experience.
4+ years of relevant software development experience.
Strong modern C++ skills: Proficiency with C++11/14/17 (or newer) and the STL, with an emphasis on clean, maintainable, performant code.
Deep learning familiarity: Experience with modern inference frameworks and an understanding of the architectural nuances of LLMs, Diffusion, and multi-modal models.
Systems thinking: Interest in how software architecture must evolve to support automated, agent-driven development and indefinitely scaling codebases.
End-to-end product sense: Ability to translate high-level customer needs into concrete technical requirements and user-centric solutions.
Pragmatic execution: Demonstrated ability to go from customer requests to production-quality software on tight timelines.
Collaborative mindset: Excellent communication skills and comfort working across internal organizations and with customers.

Nice To Haves

Agentic framework experience: Hands-on work with AI agent orchestrators or multi-agent coding frameworks, or experience building custom agentic coding harnesses for production software.
CUDA & kernel expertise: Experience with CUDA programming or exposure to kernel generation / autotuning efforts.
High-velocity prototyping: A track record of rapidly turning state-of-the-art papers into working prototypes in days, not weeks.
Performance profiling skills: Expertise in software performance analysis, profiling, and optimization (CPU and/or GPU), including using tooling to drive measurable wins.

Responsibilities

Architecting an AI-native framework: Help design and build a codebase and architecture that scales beyond human capacity, supporting large numbers of AI agents working in parallel to generate, test, and validate production-grade software.
Scaling through agentic workflows: Improve the ratio of compute-to-software output by adopting and building AI-native tools, multi-agent orchestrators, and codebase harnesses that keep humans focused on the highest-value work.
Rapid prototyping with SOTA models: Act as a technical scout, identifying industry and academic breakthroughs (e.g., new attention mechanisms, KV cache strategies) and dispatching AI agent swarms to prototype and integrate these capabilities into our framework.
Delivering a great user experience: Ensure a seamless, high-performance path to production for the latest model families (LLMs, Diffusion, Audio, Vision and multi-modal models).
Extreme performance optimization: Work at the intersection of Python orchestration and C++ engine-level optimizations to achieve major latency and throughput gains for critical customer use cases.