We're looking for a Principal Software Engineer to join our CSP Engagements team as the technical focal point for fleet-scale reliability, working directly with engineering teams of key CSP / hyperscale customers to ensure NVIDIA platforms achieve target MTBI (Mean Time Between Interruptions) in production. In this role, you will augment NVIDIA's internal software/firmware and quality teams with a dedicated CSP-facing focus. You will drive work streams with CSP engineering teams to build shared understanding of reliability software/firmware architecture, methodology, incorporate their fleet telemetry and failure data into NVIDIA's improvement priorities, and validate that reliability improvements measured in the lab translate to real customer environments. Your cross-CSP visibility enables you to distinguish systemic architectural gaps from environmental or configuration-specific issues that no single customer engagement could identify alone.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Principal