Principal Engineer, Data Center Power Software

NVIDIA•Redmond, WA

1d•$272,000 - $431,250

About The Position

NVIDIA is a leader in groundbreaking developments in Artificial Intelligence, High-Performance Computing, and Visualization, with its GPU invention at the heart of its products and services. The company is seeking a Principal Engineer for Data Center Power Software to help accelerate the next wave of artificial intelligence. This role involves leading the software development lifecycle for data center power, from gathering use cases and requirements, translating them into software roadmaps, and executing those roadmaps across internal NVIDIA teams and external partners. The position requires reporting project status, risks, and roadmap pivots to internal and external executives. A key aspect is brokering technical discussions between highly technical subject matter experts, leveraging AI tools and workflows for quick iteration on designs, prototypes, documentation, tests, and code. The role also entails architecting distributed, robust, and scalable GoLang and Rust system software, deployed to monitor and manage large datacenters. NVIDIA is recognized as one of the technology industry's most esteemed employers, valuing diversity and innovation.

Requirements

BS or higher in Computer Science or equivalent experience
15+ years of meaningful industry experience with a strong scalable system software development background
Experience with APIs and interface design
Experience with AI tools and development workflows
Outstanding written and verbal interpersonal skills
Business level English
Strong motivation and commitment to learn new skills
Ability to manage time in a fast, heavily multitasked environment
Development experience with Rust, Python, and/or GoLang
Development experience with distributed systems and concurrent applications, especially in a Kubernetes environment
Ability to quickly understand unfamiliar technical domains, identify core problems, and translate ambiguous requirements into actional engineering plans
Skilled at producing clear technical documentation, design docs, and status updates that keep cross-functional partners aligned
Track record of identifying process inefficiencies and introducing automation, tooling, or AI-power workflows that measurably improve team out

Nice To Haves

Development experience in relevant coding languages like GoLang and Rust
Experience with SCADA or Data Center power related software
Background with containers (e.g. Docker, OCI), orchestration frameworks, and logging/telemetry backends with Kubernetes monitoring stacks with tools such as Prometheus, Loki and Grafana
Experience with modern UI development in React and Node.js or similar frameworks
Experience developing Kubernetes operators or Helm charts
Experience with HPC job schedulers like Slurm or Run.AI
Familiarity with Kubernetes internals
Exposure to GPU programming with CUDA

Responsibilities

Gathering use cases + requirements, translating those into software roadmaps, and executing those roadmaps across internal NVIDIA teams and external partners
Reporting project status, risks, help needed, and roadmap pivots to internal and external executives via status reports and in-person meetings
Brokering technical discussions between highly technical subject matter experts
Leveraging AI tools and workflows to quickly iterate on designs, prototypes, documentation, tests, and code
Architecting distributed, robust, and scalable GoLang and Rust system software, deployed to monitor and manage large datacenters