CVP AI Infrastructure

Advanced Micro Devices, IncSan Jose, CA
6d

About The Position

CVP AI Infrastructure WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. This role is not eligible for visa sponsorship. THE ROLE: We are seeking a strategic and execution-oriented Vice President of Engineering Infrastructure to lead our critical developer platform serving 1,000+ engineers. You will own the end-to-end infrastructure that powers development, testing, builds, performance engineering, and large-scale distributed workloads across our large GPU fleet. THE PERSON: This is a high-impact leadership role with a big capital expenditure portfolio and direct influence on engineering velocity, developer experience, and technical excellence across the organization.

Nice To Haves

  • Deep-Stack Infrastructure Expertise: Managing distributed systems at hyperscale, with a proven track record in bare metal management and automated provisioning.
  • Accelerated Compute Mastery: Expert-level knowledge of the GPU software stack, including drivers, kernel tuning, and hardware/software optimizations for AMD or NVIDIA platforms.
  • High-Performance Networking & Storage: Hands-on experience with low-latency fabrics (InfiniBand, RDMA/RoCE) and tiered storage architectures required to prevent GPU starvation.
  • Modern Orchestration Tooling: Deep technical proficiency in container ecosystems (Docker, Kubernetes) and specialized AI schedulers, with experience building automated CI/CD pipelines for infrastructure.
  • Systems-Level Management: In-depth understanding of Linux kernel internals and low-level hardware management protocols like IPMI, Redfish, or BMC configuration.
  • AI/ML Operational Fluency: Practical experience with the AI lifecycle, including LLM pre-training, fine-tuning (LoRA), and RAG pipelines, to better support internal engineering and customer success teams.

Responsibilities

  • Define and execute infrastructure roadmap aligned with business and engineering growth
  • Own capacity planning and forecasting models to ensure infrastructure scales ahead of demand
  • Partner with Finance on capital planning, TCO optimization, and investment prioritization
  • Build executive-level relationships across Engineering, Product, and Operations
  • Lead architecture decisions for compute, storage, networking, and cluster orchestration
  • Drive reliability, performance, and efficiency improvements across the shared platform
  • Establish SLAs, metrics, and operational excellence standards for developer-facing services
  • Build, mentor, and scale a high-performing infrastructure organization
  • Foster a culture of engineering excellence, ownership, and continuous improvement
  • Balance competing demands across workload types (dev, test, build, perf, large-scale runs)
  • Champion developer experience and internal customer satisfaction

Benefits

  • AMD benefits at a glance.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service