Sr. Staff Software Engineer, Systems Infrastructure

LinkedInMountain View, CA
$198,000 - $326,000Hybrid

About The Position

This role will be based in Sunnyvale or Mountain View, CA. At LinkedIn, our approach to flexible work is centered on trust and optimized for culture, connection, clarity, and the evolving needs of our business. The work location of this role is hybrid, meaning it will be performed both from home and from a LinkedIn office on select days, as determined by the business needs of the team. LinkedIn’s AI Infrastructure organization is responsible for building the foundational platforms that power AI across LinkedIn. The LLM Serving team builds the critical infrastructure that enables efficient, reliable, and large-scale deployment of large language models and other advanced AI models in production. This team sits at the center of LinkedIn’s AI platform, owning the layer between model training and production serving. The work focuses on making large-scale models run faster, cheaper, and more efficiently on GPUs at LinkedIn scale. The team builds and extends high-performance serving infrastructure and contributes to leading open-source technologies such as SGLang, vLLM, and related model serving frameworks. We are looking for a Senior Staff Software Engineer with deep expertise at the intersection of systems, machine learning, GPU infrastructure, and large-scale inference. This is a highly technical, high-leverage role for someone who enjoys going deep into how models interact with runtimes, compilers, and hardware, and who wants to drive meaningful improvements in performance, cost, latency, and scalability across LinkedIn’s AI systems.

Requirements

  • BA/BS degree in Computer Science or related technical field, or equivalent practical experience
  • 8+ years of experience in software engineering, distributed systems, infrastructure, or machine learning systems
  • Experience building or optimizing large-scale production ML systems, model serving platforms, or AI infrastructure
  • Experience with GPU-based systems, CUDA, kernel optimization, or hardware-aware performance tuning
  • Experience with large-scale inference systems, including latency, throughput, reliability, and cost optimization
  • Experience with deep learning frameworks such as PyTorch, TensorFlow, or similar
  • Experience programming in one or more systems languages such as C++, Go, Python, or Java

Nice To Haves

  • Deep experience with LLM serving infrastructure, AI inference platforms, or large-scale model deployment systems
  • Familiarity with or contributions to open-source serving frameworks such as vLLM, SGLang, Triton, TensorRT, Ray, or similar technologies
  • Experience with ML compilers, runtimes, or graph optimization frameworks such as XLA, TVM, TensorRT, Triton, or similar
  • An understanding of model optimization techniques such as quantization, pruning, compression, batching, caching, and memory optimization
  • Experience improving GPU utilization and cost/performance efficiency for large-scale ML workloads
  • Experience building high-performance online or offline inference pipelines
  • An understanding of distributed systems, scheduling, resource management, and large-scale infrastructure operations
  • Experience operating across the stack from model-level optimization to runtime, compiler, kernel, and hardware-level performance improvements
  • Experience influencing technical direction across teams and partnering effectively with ML researchers, infrastructure engineers, and product teams

Responsibilities

  • Lead the design, development, and optimization of LinkedIn’s large-scale LLM serving infrastructure
  • Drive performance improvements across AI inference systems, including latency, throughput, GPU utilization, and cost efficiency
  • Build and scale online and offline inference systems for LLMs and other AI models
  • Optimize model execution across the full stack, including model architecture, runtime, compiler, kernel, and hardware layers
  • Drive model optimization techniques such as quantization, pruning, compression, batching, and memory optimization
  • Improve GPU efficiency through low-level systems work, including kernel-level optimization, runtime tuning, and hardware-aware performance improvements
  • Partner closely with ML, infrastructure, and product teams to identify serving bottlenecks and improve end-to-end model performance
  • Contribute to and/or extend open-source LLM serving frameworks such as SGLang, vLLM, Triton, or similar technologies
  • Set technical direction for model serving, inference performance, and next-generation AI infrastructure design
  • Mentor engineers and influence technical strategy across AI Infrastructure

Benefits

  • annual performance bonus
  • stock
  • benefits
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service