Senior Full-Stack Software Engineer

NVIDIA•Santa Clara, CA

About The Position

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 30 years. Today, we're at the forefront of AI innovation powering breakthroughs in research, autonomous vehicles, robotics, and more. The DGX Cloud team builds and operates the AI infrastructure that fuels this progress. We’re looking for a Senior Full-Stack Software Engineer to join the AI Hub team within the DGX Cloud AI Infrastructure organization. The AI Hub team accelerates AI research by ensuring NVIDIA’s AI infrastructure is used efficiently, transparently, and at scale. Our focus is building a unified, self‑service “single pane of glass” portal that enables AI researchers to manage, monitor, and optimize their usage of Managed AI research Superclusters. This role is ideal for an experienced engineer who enjoys owning well‑scoped features end-to-end, collaborating closely with peers and senior engineers, and building reliable, user-facing systems that operate at scale. What You’ll Be Doing:

Requirements

5+ years of professional software engineering experience building and operating production web systems.
Bachelor’s degree in Computer Science or a related technical field (or equivalent experience).
Solid experience with full‑stack development, including:
Modern frontend frameworks (React / Next.js or similar)
JavaScript / TypeScript
One or more backend languages such as Node.js, Python, or Go
Hands‑on experience with cloud platforms (AWS, GCP, or Azure), containers (Docker), and basic orchestration concepts (Kubernetes).
Familiarity with RESTful API design, schema evolution, and integration patterns.
Experience working with observability tools such as Prometheus, Grafana, OpenSearch, or similar.
Strong problem‑solving skills, attention to detail, and the ability to work effectively within a team.
Clear written and verbal communication skills and a willingness to learn and grow.

Nice To Haves

Experience building internal platforms or self‑service tools for engineers or researchers.
Exposure to machine learning infrastructure or AI workloads (hands‑on or close collaboration with ML teams).
Familiarity with GPU‑backed systems or large-scale distributed environments.
Experience using AI‑assisted development tools to improve productivity and code quality.

Responsibilities

Design, implement, and maintain full‑stack features across frontend, backend services, and data layers, meeting defined reliability, availability, and performance expectations.
Own delivery of well‑scoped components or features, from design through implementation, testing, and production support.
Contribute to system reliability, performance, and observability improvements by identifying bottlenecks, fixing issues, and following established best practices.
Collaborate with product managers, designers, and AI research stakeholders to understand user needs and translate them into clear technical solutions.
Participate in design reviews, code reviews, and on‑call rotations to help maintain a high-quality, production‑ready system.
Follow and contribute to team standards for code quality, testing, security, and CI/CD.
Learn and apply modern practices in cloud infrastructure, deployment, and monitoring with guidance from senior engineers.