NVIDIA's Observability team is seeking a Senior/Staff Engineer to compose and build the next-generation, multi-region observability platform. This platform powers our rapidly expanding AI, Data, and Observability ecosystem, operating at an immense scale: trillions of metrics, hundreds of terabytes of logs, and billions of distributed traces are processed daily across high-performance datacenters and multi-cloud environments. This is a high-impact, architecture-heavy, and code-first role. You will be responsible for architecting, building, and operating NVIDIA’s unified observability stack, which encompasses metrics, logs, traces, profiles, and analytics powered by advanced technology. Your ownership will span the entire telemetry pipeline, including ingestion, storage, query routing, governance, multi-tenant isolation, GPU-accelerated analytics, and real-time insights. You will play a crucial role in crafting NVIDIA's global observability strategy by collaborating with teams with varied strengths including GPU Compute, Distributed Systems, Networking, ML Infra, AI Platform, and Cloud Services to ensure engineers have deep access to system health, performance, and debugging signals.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
5,001-10,000 employees