Autonomous Agent Engineer

NVIDIA•Santa Clara, CA

About The Position

We're building the infrastructure that lets AI agents operate autonomously and securely at NVIDIA. This role owns the execution environments, state management systems, and security boundaries that make autonomous agents safe and reliable. The team designs and ships SDKs, CLIs, and developer tooling that turn complex sandboxing into a straightforward experience for agent builders and users across the company. Today "sandbox" means different things to different teams: Docker containers, microVMs, or full virtual machines, each with different security guarantees. We need someone who can navigate these tradeoffs and build a unified developer experience on top of them. This work is greenfield! Many of the problems we're solving don't have existing industry solutions, and we want someone who is energized by that.

Requirements

BS or MS in Computer Science, Engineering, or related field (or equivalent experience)
8+ years building distributed systems, infrastructure, or developer platforms at scale
Deep systems engineering skills: containers, microVMs, Kubernetes, Linux security primitives
Track record of shipping developer SDKs or CLIs that are adopted by multiple teams
Experience building agents using various frameworks and harnesses in enterprise context
Proficiency in Python, Go, Rust, or similar

Nice To Haves

Experience building execution environments for agentic AI systems or LLM applications that execute code autonomously
Experience with sandboxing and isolation technologies (gVisor, Firecracker, Kata Containers, V8 isolates, or similar)
Strong security fundamentals: threat modeling, auth, least privilege, secrets management
Designed multi-tenant execution platforms, serverless infrastructure, or sandboxed compute at scale
Background in durable execution patterns or checkpoint/recovery systems for long-running workloads

Responsibilities

Architect sandboxed compute environments where agents securely execute code, access tools, and interact with external services
Design and ship SDKs (Python, Go) and CLI tooling for provisioning and managing agent workloads in isolated environments
Create onboarding templates, reference implementations, and CLI workflows that make secure execution the default
Build state management for long-running agent operations, including checkpoint and recovery
Embed security into SDK primitives like isolation policies, secrets injection, network policies, capability declarations, and kill switches
Engineer auth integrations for workload identity, delegated tool access, and scope attenuation without static secrets
Build observability and audit infrastructure: structured logs, decision traces, security telemetry, and audit trails wired into enterprise monitoring