About The Position

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE ROLE: AMD AI Framework is seeking a Senior Software Developer to be part of the Transformer Engine (TE), a high-performance library designed to accelerate Transformer model training using low-precision arithmetic and custom GPU kernels on MI GPUs and play a key role in enhancing the Megatron-LM (ROCm) framework through fused operations enabling scalable LLM training. As part of a highly skilled team, you'll contribute to cutting-edge deep learning infrastructure and integrate performance-critical components into both client products and the open-source ecosystem. THE PERSON: The ideal candidate should be passionate about software engineering and possess leadership skills to drive sophisticated issues to resolution. Able to communicate effectively and work optimally with different teams across AMD.

Requirements

  • Programming Languages & Software Development: Proficient in C/C++ and Python, with experience in software design, debugging, performance analysis, and test development.
  • Object-Oriented Design: Solid foundation in object-oriented programming with a focus on writing clean, efficient, and maintainable code.
  • Concurrent and Multithreaded Programming: Experience in modern concurrency models and threading APIs for high-performance computing.
  • GPU and Parallel Computing: Familiar with GPU programming using HIP, CUDA, and OpenCL, with a foundational understanding of deep learning.
  • Team Collaboration and Communication: Strong problem-solving and communication skills, with a proven ability to work effectively in collaborative team settings.

Nice To Haves

  • Deep Learning Optimization: Experience analyzing deep learning workloads with an emphasis on maximizing throughput and performance.
  • Numerical Computing: Understanding of floating-point arithmetic and its impact on accuracy and precision in scientific computations.
  • Development Tools and Processes: Experienced with GitHub, CI/CD workflows, and debugging/profiling tools in Linux-based development environments.

Responsibilities

  • Library Optimization: Optimize open-source deep learning libraries, including Megatron and Transformer Engine, for peak performance on AMD GPUs.
  • Model Performance Scaling: Analyze and optimize deep learning models for AMD GPUs across both multi-GPU (scale-up) and multi-node (scale-out) systems.
  • Engineering Best Practices: Apply modern software engineering practices while staying current with advancements in hardware, algorithms, and system architecture.
  • Hardware Enablement: Contribute to the bring-up and development of new AMD ASICs and GPU hardware platforms.
  • Data-Driven Optimization: Use performance data and profiling insights to drive optimizations and influence AMD’s deep learning technology roadmap.
  • Debugging and Innovation: Debug existing systems and explore more efficient alternatives to improve performance and maintainability.
  • Collaboration and Partnerships: Work closely with internal GPU library teams and external partners to optimize training workloads through technical collaboration.

Benefits

  • AMD benefits at a glance.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service