About The Position

We are looking for a Sr. Engineer to design, build, and scale the infrastructure powering NVIDIA’s AI agent ecosystem. You will work at the intersection of distributed systems, developer platforms, and agentic AI — building the foundational services that enable teams across the company to develop, deploy, orchestrate, and operate autonomous AI agents at production scale.

Requirements

  • Bachelor's or Master's degree in Computer Science, Engineering, or related field (or equivalent experience), with 8+ years in software engineering — ideally in platform engineering, infrastructure, or developer tools
  • Experience building and scaling AI agents in production using frameworks like Claude Code, Codex, or LangGraph
  • Deep Kubernetes expertise including pod orchestration, persistent storage, RBAC, and multi-cluster management
  • Strong Python skills with production API experience using FastAPI, Flask, or similar async frameworks
  • Proven track record designing distributed systems with Kafka, Redis, and MongoDB or PostgreSQL
  • Expertise building and managing robust CI/CD pipelines using GitLab CI and ArgoCD for continuous delivery to Kubernetes
  • Experience designing AI data platform components (ingestion pipelines, vector stores, retrieval APIs, data preprocessing workflows) and building developer-facing platform APIs consumed by multiple engineering teams
  • Solid grasp of auth and identity: OAuth 2.0, JWT, token exchange, and secrets management with Vault
  • History of leading sophisticated technical projects such as migrations or greenfield platform builds, with strong interpersonal skills to drive alignment across teams and write clear design documents

Nice To Haves

  • Experience building or operating AI agent platforms or agentic workflow systems, with hands-on expertise in agent protocols and frameworks like MCP, A2A, LangChain, or LangGraph
  • Hands-on experience with RAG architectures, embedding pipelines, and vector databases (Milvus, Pinecone, or Weaviate)
  • Full-stack skills with React or Vue for building developer portals and dashboards
  • Contributions to open-source infrastructure or platform tooling

Responsibilities

  • Build and develop platform services that own the full agent lifecycle from registration through deployment, execution, and teardown
  • Architect Kubernetes-based execution environments with pod lifecycle management, namespace isolation, persistent storage, and identity propagation
  • Develop and maintain automated CI/CD pipelines using GitLab CI and ArgoCD, including reusable pipeline templates and deployment blueprints that standardize how agents are built across teams
  • Build framework-agnostic infrastructure supporting multiple agent SDKs (Claude Code, OpenAI Codex, LangGraph), with hands-on experience using harnesses, lifecycle hooks, skills configurability, observability (OTEL), and memory services
  • Build and operate Kafka-based message pipelines and real-time event streaming using Redis PubSub and SSE
  • Develop data ingestion pipelines, access interfaces, and storage layers that power AI agent knowledge and context
  • Implement session management for state persistence, conversation history, and agent recovery across sessions
  • Develop multi-layer auth using OAuth 2.0, JWT validation, token exchange, and gateway integration, and manage secrets lifecycle with Vault (provisioning, rotation, container injection)
  • Partner with security teams on compliance, access controls, and approval workflows for agent operations

Benefits

  • equity
  • benefits
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service