About The Position

For over 25 years, NVIDIA has been at the forefront of transforming computer graphics, PC gaming, and accelerated computing, driven by a legacy of continuous innovation and exceptional talent. We are now leveraging the immense potential of AI to usher in the next era of computing, where our GPUs power the "brains" of computers, robots, and autonomous vehicles that can comprehend the world. This pioneering work demands vision, innovation, and the world's best talent. Join our diverse and supportive environment, where NVIDIANs are inspired to excel and make a profound global impact. We're hiring a Senior Staff Software Engineer to own the engineering efforts across NVIDIA enterprise systems. You'll partner with IT leadership to transform reactive support into strategic, AI infused automated resolution systems and prevent problems before they occur, balancing speed, security, and an exceptional user experience for NVIDIAs.

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Engineering, IT, or related field (or equivalent experience)
  • 12+ overall years experience in SRE, Enterprise Support or Devops
  • Experience with SaaS, hybrid cloud, AI/ML environments
  • Experience building production grade agentic workflows (e.g., multi-agent systems and MCP servers)
  • Software engineering fundamentals with deep experience in building products and operating large scale systems.
  • Expertise in two or more backend languages such as Go, Python, or Java with a track record of owning complex production systems.
  • Full stack engineering experience, including building user-facing web applications and operational dashboards using modern frontend frameworks such as React.js, along with backend APIs and data pipelines.
  • Systems thinker who naturally traces dependencies, considers second-order effects, and asks "why did this break?" not just "how do I fix it?"
  • Strong incident management skills: triage, root-cause analysis, blameless postmortems, pattern recognition
  • Expert troubleshooting across Enterprise hybrid stack such as Jira, Microsoft,OS [Apple,Linux, and Windows], Infrastructure systems such as compute,, AI, and storage.

Responsibilities

  • Design and implement agentic AI workflows using LLM-based agents, tool calling, RAG patterns, and orchestration frameworks.
  • Push the boundaries of what AI-assisted operations can achieve.
  • Build robust integrations and automation pipelines across ServiceNow, identity management, monitoring platforms, and enterprise SaaS.
  • Own the full stack from infrastructure to user facing tools.
  • Triage and resolve Enterprise issues with a focus on automation and improving mitigation and resolution times
  • Manage and troubleshoot Enterprise scale collaboration, productivity, AI and Infrastructure systems.
  • Trace and root cause complex, multi system failures. identify patterns in recurring tickets, and build automation or self-service solutions
  • Build and maintain runbooks, troubleshooting guides, and knowledge base articles that elevate team capabilities
  • Mentor team members on troubleshooting methodology and systems thinking

Benefits

  • You will also be eligible for equity and benefits
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service