AI Infrastructure Intern

ZoomSeattle, WA
1d$67 - $107Hybrid

About The Position

About the team The AI Infrastructure team at Zoom is dedicated to building a world-class inference infrastructure that powers all of Zoom’s AI services. Our mission is to deliver high efficiency, scalability, and cost optimization across a wide range of AI applications, including large language models (LLM), vision-language models (VLM), automatic speech recognition (ASR), machine translation (MT) and deep reasoning agent. We focus on creating a seamless collaboration between small and large models, ensuring cost-effective, privacy-preserving, and high-quality AI services for our customers. What you can expect: As an AI Intern on Zoom’s AI Infrastructure team, you will design, optimize, and scale the runtimes and services that power our AI models. Your work will directly improve efficiency, reduce latency, and lower costs across Zoom’s AI stack, ensuring reliable, high-performance AI experiences for millions of users.

Requirements

  • Track record of building scalable, reliable AI infrastructure under real-world production constraints.
  • Deep experience with transformer-based models and inference frameworks (vLLM, TensorRT-LLM, SGLang, ONNX Runtime).
  • Proficiency in Python and C++ (Java is a plus).
  • Hands-on experience with PyTorch (TorchCompile, graph-level optimization) and/or TensorFlow.
  • Knowledge of low-level hardware concepts (GPU memory hierarchy, caching, vectorization).
  • Familiarity with cloud platforms (AWS, GCP, Azure) and AI deployment tools (Docker, Kubernetes, MLflow).

Responsibilities

  • Develop and optimize AI runtimes for LLMs, ASR, MT or deep reasoning systems with a focus on performance and cost efficiency.
  • Build scalable, highly available infrastructure services to support enterprise-grade AI workloads.
  • Optimize models for edge devices (laptops, PCs and mobile devices) as well as large-scale cloud deployments.
  • Continuously improve latency, throughput, and efficiency across serving pipelines.
  • Rapidly integrate and optimize new industry models to stay ahead in AI infrastructure.
  • Apply/Implement inference optimization techniques such as quantization, kernel fusion, TorchCompile, graph optimization, KV cache, and continuous batching.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service