Principal Performance Engineer Lead

Akamai•Cambridge, MA

6h•Remote

About The Position

The Akamai Inference Cloud team is part of Akamai's Cloud Technology Group. We design and operate AI platforms that enable customers to run models with unmatched performance, compliance, and economics. The Model Intelligence & Lifecycle team owns the end-to-end model lifecyclefrom validation and security scanning through quantization, optimization, and monitoring. We ensure every model meets rigorous standards for quality, safety, and performance. As an ML Performance Engineer, you will optimize inference performance across the Akamai Inference Cloud. Your focus will be at the intersection of speed and accuracyapplying techniques like quantization, speculative decoding, and hardware-aware scheduling to maximize throughput and minimize latency. You will collaborate closely with hardware performance engineers to deliver end-to-end optimization.

Requirements

12+ years of relevant experience with a Bachelor's or Master's degree in Computer Science, Machine Learning, or a related field
Possess hands-on experience optimizing LLM inference performance (quantization, speculative decoding, model compression, etc.)
Have a solid understanding of transformer architectures and how design choices impact latency, throughput, and accuracy
Possess experience with inference serving frameworks such as vLLM, TensorRT-LLM, Triton, or similar systems
Be proficient in Python and C++ with experience profiling and optimizing compute-intensive workloads
Have familiarity with hardware-aware optimization, including GPU/accelerator scheduling and memory management trade-offs

Responsibilities

Applying and evaluating quantization, distillation, and pruning techniques to optimize model performance while preserving accuracy
Designing hardware-aware model placement and scheduling strategies to match models with optimal compute resources
Implementing and tune speculative decoding, KV-cache optimization, and batching strategies to improve inference throughput and latency
Building benchmarking and profiling pipelines to measure model-layer performance across architectures, hardware, and serving configurations
Mentoring and guiding engineers on the team through code reviews, design discussions, and technical problem-solving
Collaborating with hardware performance engineers to identify and resolve end-to-end performance bottlenecks across the inference stack