Principal Engineer, Data Center Power Software

NVIDIA•Redmond, WA

13h

About The Position

NVIDIA is a leader in AI, High-Performance Computing, and Visualization. The GPU, NVIDIA's invention, is central to modern computers and powers groundbreaking products and services. This role involves gathering use cases and requirements, translating them into software roadmaps, and executing these roadmaps with internal teams and external partners. The position requires reporting project status, risks, and needs to executives, brokering technical discussions, and leveraging AI tools for design, prototyping, documentation, testing, and coding. The core technical responsibility is architecting distributed, robust, and scalable GoLang and Rust system software for monitoring and managing large datacenters.

Requirements

BS or higher in Computer Science or equivalent experience.
15+ years of meaningful industry experience with a strong scalable system software development background.
Experience with APIs and interface design.
Experience with AI tools and development workflows.
Outstanding written and verbal interpersonal skills.
Business level English.
Strong motivation and commitment to learn new skills.
Ability to manage time in a fast, heavily multitasked environment.
Development experience with Rust, Python, and/or GoLang.
Development experience with distributed systems and concurrent applications, especially in a Kubernetes environment.
Ability to quickly understand unfamiliar technical domains, identify core problems, and translate ambiguous requirements into actionable engineering plans.
Skilled at producing clear technical documentation, design docs, and status updates that keep cross-functional partners aligned.
Track record of identifying process inefficiencies and introducing automation, tooling, or AI-powered workflows that measurably improve team output.

Nice To Haves

Development experience in relevant coding languages like GoLang and Rust.
Experience with SCADA or Data Center power related software.
Background with containers (e.g. Docker, OCI), orchestration frameworks, and logging/telemetry backends with Kubernetes monitoring stacks with tools such as Prometheus, Loki and Grafana.
Experience with modern UI development in React and Node.js or similar frameworks.
Experience developing Kubernetes operators or Helm charts.
Experience with HPC job schedulers like Slurm or Run.AI.
Familiarity with Kubernetes internals.
Exposure to GPU programming with CUDA.

Responsibilities

Gathering use cases and requirements, translating them into software roadmaps, and executing those roadmaps across internal NVIDIA teams and external partners.
Reporting project status, risks, help needed, and roadmap pivots to internal and external executives via status reports and in-person meetings.
Brokering technical discussions between highly technical subject matter experts.
Leveraging AI tools and workflows to quickly iterate on designs, prototypes, documentation, tests, and code.
Architecting distributed, robust, and scalable GoLang and Rust system software, deployed to monitor and manage large datacenters.