Nuclearn.ai builds AI-powered software for the nuclear and utility industries — tools that keep critical infrastructure reliable, efficient, and safe. Our software integrates AI-driven workflow, documentation, and research automation, and is already used at 60+ nuclear reactors across North America. You'll ship production infrastructure operators and engineers rely on every day. We're growing quickly, expanding our team and our Phoenix AI data center. The work is consequential: the infrastructure you build and maintain is the foundation everything else runs on. Eligibility: U.S. citizenship or permanent residency (green card) is required due to DOE export compliance. What You'll Do This is a hands-on infrastructure role. You will physically build, operate, and scale the GPU compute environment that powers our AI platform — not design it from a desk. Build and operate our Phoenix AI data center — Rack and cable GPU servers, configure power distribution, manage cooling and airflow, maintain redundancy, and handle firmware and hardware lifecycle. You own uptime. Plan and execute infrastructure scaling — Spec and procure hardware. Run capacity planning against real workload data. Execute GPU refreshes, storage expansions, and network upgrades with minimal disruption to production. Own the full stack from power to container — Configure bare-metal servers, IPMI/BMC management, OS provisioning, networking (switches, VLANs, cabling), storage, and container runtimes. Troubleshoot across the entire hardware-software boundary. Partner with utility IT teams on customer deployments — Review and validate customer-proposed infrastructure for hosting Nuclearn applications. Identify GPU/runtime mismatches, networking gaps, and configuration issues before go-live. Provide concrete remediation guidance. You will operate as a senior individual contributor with high autonomy and direct influence across engineering, ML, product, and customer environments. Examples of problems you might own in your first 90 days Rack and commission a new GPU node — Receive hardware, plan rack placement for power and thermal constraints, install rails, cable power and networking, configure BMC, provision the OS, validate GPU functionality, and hand off a production-ready machine to the ML team. Develop a hardware requirements standard for both internal and customer-facing deployments — GPU sizing models, storage thresholds, power and cooling requirements, networking specs, and supported configurations. Audit the Phoenix data center end-to-end — Map current power draw against capacity, identify thermal hotspots, assess cable management, review redundancy gaps, and execute targeted upgrades to keep pace with scaling workloads. Validate a utility customer's proposed infrastructure before deployment — catch a GPU/driver mismatch, flag insufficient network throughput, or identify a cooling limitation that would throttle inference performance under load.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
No Education Listed