NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world. We are seeking a highly skilled and experienced HPC Cluster Engineer to design, deploy, and operate GPU Compute Clusters for EDA and high-performance computing workloads used across multiple teams and projects. Join our engineering team and collaborate with researchers and infrastructure teams to ensure our GPU clusters perform efficiently, scale well, and remain reliable. What you'll be doing: Provide leadership and strategic mentorship on the management of large-scale HPC systems including the deployment of compute, networking, and storage. Develop and improve our ecosystem around GPU-accelerated computing including developing scalable automation solutions. Continuously improve infrastructure provisioning, management, observability and day to day operation through automation. Build and nurture customer and cross-team relationships to consistently support the clusters and address changing user needs. Support our researchers to run their workloads including performance analysis and optimizations. Conduct root cause analysis and suggest corrective action. Proactively find and fix issues before they occur. Build innovative tooling to accelerate researchers' velocity, debugging and software performance at scale.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
5,001-10,000 employees