LLM Inference Engineer

Periodic LabsMenlo Park, CA
90d

About The Position

You will integrate, optimize, and operate large-scale inference systems to power AI scientific research. You will build and maintain high-performance serving infrastructure that delivers low-latency, high-throughput access to large language models across thousands of GPUs. You will work closely with researchers and engineers to integrate cutting-edge inference into large-scale reinforcement learning workloads. You will build tools and directly support frontier-scale experiments to make Periodic Labs the world’s best AI + science lab. You will make contributions to open-source LLM inference software.

Requirements

  • Experience optimizing inference for the largest open-source models.
  • Familiarity with high-performance model serving frameworks such as TensorRT-LLM, vLLM, SGLang.
  • Knowledge of distributed inference techniques including tensor/expert/pipeline parallelism, speculative decoding, and KV cache management.
  • Experience optimizing GPU utilization and latency for reinforcement learning.

Responsibilities

  • Integrate, optimize, and operate large-scale inference systems.
  • Build and maintain high-performance serving infrastructure for large language models.
  • Deliver low-latency, high-throughput access to models across thousands of GPUs.
  • Work closely with researchers and engineers on large-scale reinforcement learning workloads.
  • Build tools to support frontier-scale experiments.
  • Contribute to open-source LLM inference software.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service