AI Performance Optimization Team Lead

Lightning AI•San Francisco, CA

64d•$225,000 - $275,000•Hybrid

About The Position

Lightning AI is the company reimagining the way AI is built. After creating and releasing PyTorch Lightning in 2019, Lightning AI was launched to reshape the development of artificial intelligence products for commercial and academic use. We are on a mission to simplify AI development, making it accessible to everyone—from solo researchers to large enterprises. By removing the complexity of building and deploying AI tools, we empower innovators to focus on solving real-world problems. Our platform is built to scale with the latest AI advancements while staying intuitive and adaptable, so you can bring your ideas to life. We have offices in New York City, San Francisco, and London and are backed by investors such as Coatue, Index Ventures, Bain Capital Ventures, and Firstminute. We are seeking a highly skilled AI Optimization Leader to work on optimizing training and inference workloads on compute accelerators and clusters, through the Lightning stack, including the Thunder compiler and the broader PyTorch Lightning ecosystem. This role sits at the intersection of deep learning research, compiler development, and large-scale system optimization. This role is also a core pillar of Lightning’s business strategy. You will be at the forefront of how we service high impact customer deals that rely on advanced optimization services, acting as a key technical and strategic leader for one of the company's major revenue-driving functions. Your work will directly influence customer success, deal structure, and our ability to scale optimization-centric offerings as a competitive advantage. You will be joining the Engineering Team and report to our Director of Engineering. This is a hybrid role based in our San Francisco office with in-office requirements of 2 days per week. The salary range for this role is $225,000-$275,000.

Requirements

Strong expertise with deep learning frameworks such as PyTorch, JAX, or TensorFlow.
Hands-on experience in profiling models on hardware accelerators, interpreting and identifying bottlenecks, recognizing the effects of changes.
Experience with model optimization techniques, including graph-level optimizations, quantization, pruning, mixed precision, or memory-efficient training.
Deep understanding of compiler internals (IR design, operator fusion, scheduling, optimization passes) or proven work in performance-critical software.
Experience with CUDA, Triton, or other GPU programming models for developing custom kernels.
Knowledge of distributed computing and parallelism strategies (data/model/pipeline parallelism, checkpointing, elastic scaling).
Familiarity with software engineering practices: designing APIs, building robust tooling, testing, CI/CD for performance-sensitive systems.
Proven track record contributing to open-source projects in the AI, scientific computing, or compiler domains.
Excellent collaboration and communication skills, with the ability to partner across research, engineering, and external contributors.
Bachelor’s degree in Computer Science, Engineering, or a related field.

Nice To Haves

Advanced degree (Master’s or PhD) in machine learning, compilers, or systems highly preferred.

Responsibilities

Own the technical direction of our performance-oriented model optimization efforts at multiple levels: Graph-level (e.g., operator fusion, kernel scheduling, memory planning) Kernel-level (CUDA, Triton, custom operators for specialized hardware) System-level (distributed training, inference serving at scale)
Advance compiler technology by building optimization passes, graph transformations, and integration hooks to accelerate training and inference workloads.
Work across the Lightning stack to ensure optimizations are accessible to end users through clean APIs, automated tooling, and seamless integration with PyTorch Lightning and LitServe.
Design and implement profiling and debugging tools to analyze model execution, identify bottlenecks, and guide optimization strategies.
Collaborate with hardware vendors and ecosystem partners to ensure workloads run efficiently across diverse backends (NVIDIA, AMD, TPU, specialized accelerators).
Contribute to open-source projects by developing new features, improving documentation, and supporting community adoption.
Engage with researchers and engineers in the community, providing guidance on performance tuning and advocating for the Lightning stack as the go-to optimization stack in ML workflows.
Work cross-functionally with Lightning’s product and engineering teams to ensure compiler and optimization improvements align with the broader product vision.

Benefits

We offer competitive base salaries and stock options with a 25% one year cliff and monthly vesting thereafter.
For our international employees, we work with our EOR to pay you in your local currency and provide equitable benefits across the globe.
In the US, we offer: Medical, dental and vision Life and AD&D insurance Flexible paid time off including winter closure Generous paid family leave benefits $500 monthly meal reimbursement, including groceries & food delivery services $500 one time home office stipend $1,000 annual learning & development stipend 100% Citibike membership (NYC only) $45/month gym membership Additional various medical and mental health services

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume