NVIDIA is seeking a Senior DevOps Engineer to operate its AI Data Center AIOps platform. This role focuses on maintaining the platform's uptime, performance, data integrity, and safe change management. The engineer will be responsible for SLOs/SLIs, incident response, and postmortems related to telemetry ingestion, processing, storage, and APIs/dashboards. This position involves collaboration with Software Engineering and Systems Engineering teams to translate platform signals into actionable alerts and automation. The goal is to ensure the reliability and efficiency of the platform that operators depend on.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
Associate degree
Number of Employees
5,001-10,000 employees