AI Software Engineer

ZoomSeattle, WA
$151,800 - $332,200Hybrid

About The Position

The AI Infrastructure team at Zoom is focused on enabling high-performance AI across Zoom’s products and services. The team builds the core systems that support model training, deployment, and inference at scale, driving innovation in areas such as real-time communication, computer vision, and natural language understanding. The role involves designing, implementing, and owning the inference systems that serve Zoom's AI models at production scale, across real-time communication, vision, and language workloads. This includes hands-on work with kernel-level optimization, inference framework internals, and production serving infrastructure, collaborating with research and platform teams to optimize latency, throughput, and cost.

Requirements

  • A Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or a related technical field
  • 5+ years of software engineering experience, with significant time spent on inference systems or ML infrastructure at production depth
  • Hands-on experience with at least one major inference framework: vLLM, TensorRT-LLM, SGLang, or ONNX Runtime (serving, not just export)
  • GPU programming experience: CUDA kernel development, memory optimization, profiling with Nsight or equivalent
  • Production experience serving LLMs or large vision models, you've owned latency SLOs, debugged throughput regressions, and shipped optimizations that moved the needle
  • Depth in at least two of: speculative decoding, continuous batching, KV cache design, quantization pipelines, prefill/decode disaggregation
  • Strong systems instincts in Python and C++; ability to read and modify framework internals

Nice To Haves

  • Advanced degrees (Master’s or PhD) are advantageous
  • Experience with MoE models or 100B+ parameter deployments
  • Familiarity with disaggregated serving architectures or multi-node inference
  • Background in compiler-level optimization (XLA, Triton, or similar)

Responsibilities

  • Design and build high-performance inference serving systems for large-scale transformer and multimodal models (including 100B+ and MoE architectures)
  • Implement and tune inference optimizations: speculative decoding, continuous batching, KV cache management, prefill/decode disaggregation, and quantization (INT4/INT8/FP8)
  • Contribute to and customize inference frameworks (vLLM, TensorRT-LLM, SGLang, or equivalent) for Zoom's production requirements
  • Write and profile CUDA kernels and custom ops where framework-level optimization is insufficient
  • Own end-to-end deployment: from model packaging and serving API design to latency SLO monitoring and incident response
  • Partner with research to translate model architecture changes into inference-efficient implementations
  • Drive technical design and set the bar for inference eng practices across the team

Benefits

  • Award-winning workplace culture
  • Variety of perks, benefits, and options to help employees maintain their physical, mental, emotional, and financial health
  • Support work-life balance
  • Contribute to their community in meaningful ways
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service