Performance Analysis Engineer Intern (Summer 2026)

Astera Labs Early Career•San Jose, CA

About The Position

We are seeking a Performance Analysis Engineer to drive system-level performance optimization across large-scale AI training and inference environments. In this role, you will analyze, profile, and optimize distributed workloads running on high-density accelerator clusters, working across the full stack, from ML frameworks and communication libraries to network fabrics and hardware architecture. You will play a critical role in ensuring that next-generation AI workloads achieve near-peak hardware efficiency , while directly influencing software architecture, infrastructure design, and future silicon and networking roadmaps.

Requirements

Education: Bachelor’s, Master’s, or PhD in Computer Engineering, Electrical Engineering or a related field.
Hands-on experience optimizing distributed ML workloads across multi-node accelerator clusters.
Strong understanding of data parallelism, model parallelism, and pipeline parallelism .
Deep knowledge of GPU or accelerator architectures , including compute units, memory hierarchies, and interconnects (PCIe, NVLink, or equivalents).
Experience working with NCCL, RCCL, MPI , or similar collective communication frameworks.
Strong understanding of high-performance networking (Ethernet, InfiniBand, RoCE) and their impact on distributed workloads.
PyTorch & ML Systems Proficiency
Advanced experience with PyTorch , including distributed training internals and execution tracing.
Ability to diagnose and optimize framework-level and runtime bottlenecks.
Comfortable debugging issues across software, firmware, and hardware boundaries .
Strong proficiency in Python and C/C++ .
Experience building performance analysis tools, automation, and benchmarking frameworks.
Ability to clearly communicate complex performance findings to cross-functional teams.
Comfortable working in fast-moving, ambiguous environments.