Senior ML Performance Engineer

Lemurian Labs•Toronto, ON

52d

About The Position

We're looking for a Senior ML Performance Engineer to architect and lead our Performance Testing Platform from the ground up. You'll be the technical authority on how we measure, validate, and optimize the performance of large language models — including Llama 3.2 70B, DeepSeek, and others — before and after compiler optimization on modern GPU architectures. This is a high-impact role at the intersection of ML systems, GPU architecture, and performance engineering. You'll build the infrastructure that proves our compiler delivers real, measurable value — and you'll work directly with compiler and ML engineers to drive the optimizations that get us there.

Requirements

BS degree in computer science, computer engineering, electrical engineering, or equivalent practical experience
7+ years of experience in performance engineering, benchmarking, or systems engineering roles
Deep understanding of ML inference workloads, particularly transformer-based models and LLMs
Hands-on experience with GPU programming and optimization (CUDA, ROCm, or similar)
Strong programming skills in Python and C/C++
Proven track record of building performance testing infrastructure or benchmarking platforms from scratch
Experience with ML frameworks (PyTorch, TensorFlow, ONNX Runtime, vLLM, TensorRT-LLM, etc.)
Proficiency with profiling and debugging tools for GPU workloads
Strong analytical skills with the ability to design experiments, analyze results, and communicate findings clearly
Experience with CI/CD systems and test automation frameworks

Nice To Haves

Masters or PhD degree in computer science, computer engineering, electrical engineering, or equivalent practical experience.
Experience with AMD GPUs (Mi200/Mi300 series) and ROCm ecosystem
Knowledge of compiler optimization techniques and their impact on performance
Experience with distributed inference and multi-GPU workloads
Familiarity with ML model quantization, pruning, and other optimization techniques
Background in high-performance computing or systems-level optimization
Experience with infrastructure-as-code (Kubernetes, Docker, Terraform)
Contributions to open-source ML or systems projects

Responsibilities

Design and build a comprehensive performance testing platform for evaluating LLM inference workloads across GPU clusters
Define and implement the benchmarking methodology, metrics, and test suites that measure latency, throughput, memory utilization, power consumption, and model accuracy
Establish baseline performance for unoptimized models (Llama 3.2 70B, DeepSeek, etc.) and validate post-optimization improvements
Develop automated testing pipelines for continuous performance validation across compiler releases and model updates
Investigate performance bottlenecks using profiling tools (ROCm profilers, GPU traces, system-level monitoring) and work with the compiler team to drive optimizations
Create dashboards and reporting that provide clear visibility into performance trends, regressions, and wins
Collaborate cross-functionally with compiler engineers, ML engineers, and DevOps to ensure performance testing is integrated into our development workflow
Document best practices for performance testing and optimization of ML workloads on GPU hardware

Benefits

Competitive compensation including equity, medical/dental/vision, retirement savings, and wellness benefits.
Additional benefits include equity, company bonus opportunities, medical, dental, and vision coverage, a retirement savings plan, and supplemental wellness benefits.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume