About The Position

NVIDIA has become the platform upon which every new AI-powered application is built. From healthcare research applications to autonomous vehicles, or voice-recognition systems, the need for advanced perception and cognitive capabilities is exploding, and NVIDIA is right in the center of this revolution. We are seeking a motivated Senior Systems Software Engineer to join our Autonomous Vehicle Infrastructure organization, focusing on building, deploying, and operating validation platforms at scale. In this role, you will work with internal teams and external partners to integrate distributed systems, manage large-scale data pipelines, and operationalize next-generation validation workflows for autonomous driving. This role offers a chance to start from the ground up: standing up new vendor-provided platforms, validating integration paths, and ensuring infrastructure is reliable, secure, and production-ready. You will combine hands-on engineering, infrastructure deployment, and workflow automation to help scale our AV validation ecosystem.

Requirements

  • BS/MS in Computer Science, Computer Engineering, or relevant field (or equivalent experience).
  • 5+ years of professional experience in infrastructure, distributed systems, or platform engineering.
  • Hands-on experience with Linux systems, Kubernetes/Docker, Terraform, and CI/CD pipelines.
  • Strong scripting/development skills in Python, Bash, and exposure in C++ and/or GoLang.
  • Familiarity with Bazel build/test automation frameworks.
  • Experience in data/log ingestion workflows and distributed compute/storage systems.
  • Strong debugging, problem-solving, and communication skills to work across internal and vendor teams.
  • Proven comfort leveraging AI based development tools, such as Claude Code and Cursor.

Nice To Haves

  • Strong experience in large-scale distributed systems or GPU/CPU cluster deployments, infrastructure automation, data pipelines, and AWS.
  • Prior experience with scenario-based validation platforms or AV simulation ecosystems.
  • Strong knowledge of logging/monitoring/alerting frameworks (Prometheus, Grafana, ELK stack).
  • Experience working directly with external vendors to integrate platforms and operationalize SLAs.
  • Proactive use of AI/ML techniques to accelerate log analysis, coverage metrics, or integration workflows.

Responsibilities

  • Deploy and operationalize vendor-provided platforms in our cloud-based service platform, starting with test environments to validate dependencies, workflows, and performance.
  • Build and maintain distributed infrastructure that supports large-scale log ingestion, data processing, and scenario validation at scale.
  • Automate workflows and pipelines using Go, Python, Bash, and Bazel to ensure reproducibility, efficiency, and reliable distributed execution.
  • Integrate simulation and drive logs (e.g. world model data, road geometries) in various formats (e.g. protobuf, parquet) with validation platforms, ensuring seamless end-to-end coverage analysis.
  • Provide visualization and reporting capabilities to surface validation results, coverage metrics, and actionable insights for developers and stakeholders.
  • Define and manage access controls, monitoring, and security policies to ensure compliance while enabling smooth collaboration across internal and vendor teams.
  • Partner closely with internal teams and external vendors to troubleshoot issues, refine SLAs, and continuously improve operational reliability and scalability.

Benefits

  • equity
  • benefits
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service