Senior Deep Learning Compiler Engineer - XLA

Jobgether

6d•$152,000 - $241,500•Remote

About The Position

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Deep Learning Compiler Engineer - XLA in United States. This role focuses on developing and optimizing compilers for high-performance deep learning workloads on modern GPU architectures. You will design, implement, and tune advanced compiler optimization algorithms to accelerate training and inference for deep learning frameworks at scale. The position involves collaborating with framework teams, hardware engineers, and cross-functional partners to deliver production-grade software that powers next-generation AI systems. You will work on graph partitioning, tensor sharding, performance analysis, and code generation, while also contributing to user-facing library features. This role combines deep technical expertise, creativity, and autonomy in a dynamic, research-driven environment with significant impact on AI computing performance.

Requirements

Bachelorâs, Masterâs, or Ph.D. in Computer Science, Computer Engineering, or related field, or equivalent experience
4+ years of experience in performance analysis, compiler optimization, or deep learning software development
Strong C/C++ programming skills with expertise in software design, debugging, and testing
Knowledge of CPU, GPU, or other high-performance hardware architectures, including distributed computing
Experience with CUDA or OpenCL is desirable
Familiarity with XLA, TVM, MLIR, LLVM, OpenAI Triton, or deep learning frameworks such as JAX, PyTorch, or TensorFlow is a strong plus
Ability to work independently, define project scope, and deliver high-quality software
Excellent communication and collaboration skills; experience mentoring junior engineers is a bonus

Responsibilities

Develop compiler optimization techniques for deep learning network graphs
Design and implement graph partitioning and tensor sharding for distributed training and inference
Optimize performance and analyze computational efficiency on GPU hardware
Generate code for NVIDIA GPU backends using MLIR, LLVM, OpenAI Triton, or similar compilers
Contribute to user-facing features in deep learning frameworks and related libraries
Collaborate with GPU hardware teams to align software features with next-generation architectures
Mentor junior engineers and support knowledge sharing within the team