Sr. Staff Software Engineer, Systems Infrastructure

LinkedIn•Mountain View, CA

1d•$198,000 - $326,000•Hybrid

About The Position

This role will be based in Sunnyvale or Mountain View, CA. At LinkedIn, our approach to flexible work is centered on trust and optimized for culture, connection, clarity, and the evolving needs of our business. The work location of this role is hybrid, meaning it will be performed both from home and from a LinkedIn office on select days, as determined by the business needs of the team. LinkedIn’s AI Infrastructure organization is responsible for building the foundational platforms that power AI across LinkedIn. The LLM Serving team builds the critical infrastructure that enables efficient, reliable, and large-scale deployment of large language models and other advanced AI models in production. This team sits at the center of LinkedIn’s AI platform, owning the layer between model training and production serving. The work focuses on making large-scale models run faster, cheaper, and more efficiently on GPUs at LinkedIn scale. The team builds and extends high-performance serving infrastructure and contributes to leading open-source technologies such as SGLang, vLLM, and related model serving frameworks. We are looking for a Senior Staff Software Engineer with deep expertise at the intersection of systems, machine learning, GPU infrastructure, and large-scale inference. This is a highly technical, high-leverage role for someone who enjoys going deep into how models interact with runtimes, compilers, and hardware, and who wants to drive meaningful improvements in performance, cost, latency, and scalability across LinkedIn’s AI systems.

Requirements

BA/BS degree in Computer Science or related technical field, or equivalent practical experience
8+ years of experience in software engineering, distributed systems, infrastructure, or machine learning systems
Experience building or optimizing large-scale production ML systems, model serving platforms, or AI infrastructure
Experience with GPU-based systems, CUDA, kernel optimization, or hardware-aware performance tuning
Experience with large-scale inference systems, including latency, throughput, reliability, and cost optimization
Experience with deep learning frameworks such as PyTorch, TensorFlow, or similar
Experience programming in one or more systems languages such as C++, Go, Python, or Java

Nice To Haves

Deep experience with LLM serving infrastructure, AI inference platforms, or large-scale model deployment systems
Familiarity with or contributions to open-source serving frameworks such as vLLM, SGLang, Triton, TensorRT, Ray, or similar technologies
Experience with ML compilers, runtimes, or graph optimization frameworks such as XLA, TVM, TensorRT, Triton, or similar
An understanding of model optimization techniques such as quantization, pruning, compression, batching, caching, and memory optimization
Experience improving GPU utilization and cost/performance efficiency for large-scale ML workloads
Experience building high-performance online or offline inference pipelines
An understanding of distributed systems, scheduling, resource management, and large-scale infrastructure operations
Experience operating across the stack from model-level optimization to runtime, compiler, kernel, and hardware-level performance improvements
Experience influencing technical direction across teams and partnering effectively with ML researchers, infrastructure engineers, and product teams

Responsibilities

Lead the design, development, and optimization of LinkedIn’s large-scale LLM serving infrastructure
Drive performance improvements across AI inference systems, including latency, throughput, GPU utilization, and cost efficiency
Build and scale online and offline inference systems for LLMs and other AI models
Optimize model execution across the full stack, including model architecture, runtime, compiler, kernel, and hardware layers
Drive model optimization techniques such as quantization, pruning, compression, batching, and memory optimization
Improve GPU efficiency through low-level systems work, including kernel-level optimization, runtime tuning, and hardware-aware performance improvements
Partner closely with ML, infrastructure, and product teams to identify serving bottlenecks and improve end-to-end model performance
Contribute to and/or extend open-source LLM serving frameworks such as SGLang, vLLM, Triton, or similar technologies
Set technical direction for model serving, inference performance, and next-generation AI infrastructure design
Mentor engineers and influence technical strategy across AI Infrastructure