Manager, Software Engineering - Production AI Inference

NVIDIASanta Clara, CA
$224,000 - $431,250

About The Position

NVIDIA is the platform upon which every new AI-powered application is built. We are seeking a deeply technical software manager to lead production AI inference for NVIDIA Inference Microservices (NIM), the production runtime through which customers deploy optimized, enterprise-supported AI inference across cloud, data center, and edge environments. NIM makes state-of-the-art AI models available as production-ready software stack, combining optimized inference engines, model profiles/recipes, validated runtime configurations, and security hardening. This role leads the team accountable for turning fast-moving model and inference engine work into reliable NIM releases that customers can operate with confidence. This is a hands-on engineering management role for someone who can run production execution without managing from a distance. You will lead engineers working across model onboarding, serving stack integration, performance profiling/optimization, release quality, security readiness, automation, observability, and operational health. You will partner closely with the product, solution architect, security, research, and other internal engineering teams to make day-0 model launches repeatable and to raise the production bar for every NIM release.

Requirements

  • 10+ overall years building production software, including 3+ years of managing software engineering teams.
  • Experience delivering production software with strong quality, reliability, and release expectations.
  • Experience driving process improvements, and improving operational efficiency.
  • Excellent communication and collaborator management; ability to influence executive leadership across product, research, security, and operations.
  • Deep understanding of AI/ML fundamentals, innovative model architectures, inference engine/kernel, performance optimization strategies, accelerated computing, large-scale distributed systems, and security hardening.
  • A degree in Computer Science, Computer Engineering, or a related field (BS or MS) or equivalent experience.

Nice To Haves

  • Built and managed globally distributed organizations; established durable engineering processes that significantly improved quality and velocity across multiple teams.
  • Recognized industry leader with contributions to open-source ecosystems (i.e vLLM, SGLang, TensorRTLLM, Dynamo, Triton, PyTorch), technical publications, or talks in containers, Kubernetes, GPU, or inference communities.
  • Drove measurable performance improvements for large-scale LLM inference systems, including latency, throughput, GPU utilization, cost efficiency, and performance regression prevention across production releases.
  • Hands-on experience with core GPU technologies such as CUDA, cuDNN, CUTLASS, cuBLAS, NCCL, NIXL, NVLink, and GPUDirect RDMA.
  • Hands-on experience delivering enterprise or government-ready AI software, including FedRAMP, air-gapped deployments, regulated environments, security hardening, compliance evidence, and production support expectations.

Responsibilities

  • Lead the team responsible for shipping production-ready LLM NIMs, including planning, new model onboarding, validated serving recipes, release readiness, and post-release follow-through.
  • Build a predictable operating model for the team through roadmap planning, a weekly execution rhythm, launch checklists, clear ownership boundaries, collaborator communication, and issue management.
  • Own project execution by anticipating schedule, staffing, and dependency risks.
  • Adapt plans under pressure and collaborate with peer managers to dynamically prioritize engineering timelines to remain agile in the fast paced AI industry.
  • Drive continuous improvement in production workflows through RCCA and partner feedback, removing unnecessary and redundant work while keeping the team passionate about production outcomes.
  • Build and maintain a world-class AI inference engineering team by building an innovative culture, setting clear expectations, maintaining active feedback loops, and mentoring engineers and emerging leaders.

Benefits

  • competitive salaries
  • generous benefits package
  • equity
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service