About The Position

NVIDIA is widely considered to be one of the technology world’s most desirable employers, NVIDIA leads the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing (HPC) and Visualization. DGX Cloud provides a serverless generative AI infrastructure to the world enabling NVIDIA’s AI supercomputer technologies to be used by anyone. DGX Cloud engineering has a mission to ensure our customers receive timely and quality-assured releases. We are seeking a DevOps Engineer proficient in development infrastructure and tools, with hands-on experience in continuous integration, continuous deployment, testing frameworks, and Kubernetes (K8s) based cluster automation technologies. If you excel in problem-solving, can think creatively on your feet, and enjoy working in a distributed team setting, we would love to have you join us!

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Data Science, or a related field (or equivalent experience)
  • 5+ years of experience in developing devops tooling with a profound passion for automation
  • Solid background in modern source control platforms (GitHub/GitLab)
  • Strong experience in modern CI/CD technologies (Gitlab/testing frameworks/ArgoCD)
  • Proficient in container-based infrastructure (Docker, Kubernetes, Helm)
  • Comprehensive experience with Linux distributions (Ubuntu)
  • Solid background in scripting languages (Bash, Python)
  • Working background in higher level languages (golang)
  • Excellent written and verbal communication skills

Nice To Haves

  • Experience in scaling devops practices across cross-functional teams
  • Demonstrated ability to handle sophisticated technical environments while meeting or exceeding all security, reliability, scalability, and availability metrics
  • Strong and confirmed knowledge of modern architectures at scale

Responsibilities

  • Provide both development and operational tooling critical to DGX Cloud services
  • Implement and operate services used by engineering, including first-level on-call / support
  • Assist engineering by maintaining a well optimized & supported paved road SDLC, which includes working across engineering, testing and SRE to ensure tool alignment
  • Ensure coverage of testing from unit testing to CI to smoke-testing to full end to end testing
  • Provide developer environments that are easily updated with a low barrier to entry
  • Develop and maintain continuous integration pipeline templates and testing frameworks
  • Provide and operate continuous testing end-to-end integration environments
  • Automate deployment, config, and management of Kubernetes (K8s) components
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service