AI Infrastructure Engineer

Intel CorporationHillsboro, OR
1dOnsite

About The Position

The world is transforming - and so is Intel. Intel is a company of bold and curious inventors and problem solvers who create some of the most astounding technology advancements and experiences in the world. With a legacy of relentless innovation and a commitment to bring smart, connected devices to every person on Earth, our diverse and brilliant teams are continually searching for tomorrow's technology and revel in the challenge that changing the world for the better brings. We work every single day to design and manufacture silicon products that empower people's digital lives. Come join us and do something wonderful. We are seeking a highly experienced Senior AI/ML Infrastructure Engineer to join our cloud systems team. This role focuses on designing, implementing, and optimizing AI accelerator systems and cloud infrastructure for large-scale machine learning workloads. The ideal candidate will have extensive experience with AI hardware platforms, system-level debugging, and cross-functional collaboration in enterprise environments.

Requirements

  • Bachelors & 6+ years or Masters & 4+ years or PhD & 2+ years in Computer Science, Electrical Engineering, or related field
  • 5+ years of experience in system engineering, platform validation, or related roles.
  • 2+ years experience of successfully bringing up and debugging high-performance AI clusters.
  • 2+ years experience resolving complex system-level issues in production AI/ML environments.
  • 2+ years experience AI cluster design, validation, and production deployment experience.
  • 2+ years experience Full-stack debugging capabilities from hardware layer through application layer.

Nice To Haves

  • Experience with Intel platforms (Xeon, Gaudi) or similar GPU or AI accelerators.
  • Familiarity with cloud deployment and containerization.
  • Programming: Expert-level Python.
  • AI/ML Frameworks: Experience with vLLM, PyTorch, TensorFlow, OpenMPI
  • System Tools: Linux/Unix administration, Docker, shell scripting.
  • Hardware: Deep understanding of PCIe, memory subsystems, AI accelerators.
  • Protocols: Redfish, IPMI, BMC management.
  • Computer architecture and microprocessor design.
  • AI/ML workload optimization and deployment.
  • System-level debugging and validation methodologies.
  • Enterprise platform security and manageability.

Responsibilities

  • AI/ML System Engineering Design and optimize AI accelerator systems (Gaudi, GPU clusters) for production ML workloads.
  • Debug complex PCIe, memory subsystem, and interconnect issues in AI clusters.
  • Validate and integrate the cutting-edge GPUs and AI accelerator platforms.
  • System Integration and Validation Lead platform bring-up and validation for next-generation AI hardware
  • Develop comprehensive test plans for AI systems.
  • Collaborate with OEM vendors on BMC firmware integration and system stability.
  • Perform full-stack debugging across hardware, firmware, and software layers.
  • Infrastructure and Tooling Develop automated testing frameworks and monitoring solutions.
  • Create diagnostic tools and APIs for system health monitoring.
  • Leadership and Collaboration Mentor junior engineers and data center technicians.
  • Lead cross-functional teams through complex technical challenges.
  • Coordinate with hardware, firmware, and software teams on platform readiness.
  • Drive technical decisions and architectural improvements.

Benefits

  • We offer a total compensation package that ranks among the best in the industry. It consists of competitive pay, stock bonuses, and benefit programs which include health, retirement, and vacation.
  • Find out more about the benefits of working at Intel.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

Ph.D. or professional degree

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service