About The Position

Plaud is building the world's most trusted AI work companion for professionals to elevate productivity and performance through note-taking solutions, loved by over 1,500,000 users worldwide since 2023. With a mission to amplify human intelligence, Plaud is building the next-generation intelligence infrastructure and interfaces to capture, extract, and utilize what you say, hear, see, and think. Plaud Inc. is a Delaware-incorporated, San Francisco-based company pushing the boundary of human–AI intelligence through a hardware–software combination. With SOC 2, HIPAA, GDPR, ISO27001, ISO27701, and EN18031 compliance, Plaud is committed to the highest standards of data security and privacy protection.

Requirements

  • Proven track record of building and training large-scale audio or speech models from the ground up, whether that involves unified SpeechLLMs, advanced ASR, expressive TTS, or generative audio architectures.
  • Love living at the intersection of research and engineering, eager to design novel sequence modeling architectures one day and debug distributed training clusters the next.
  • Highly comfortable traversing the entire stack—from fundamental signal processing and raw acoustic representations to massive foundation model training and edge-device optimization.
  • Deep expertise in PyTorch or JAX, with battle scars from optimizing large-scale distributed training runs, managing GPU memory utilization, and resolving complex performance bottlenecks.
  • Thrive in a fast-paced, high-growth startup environment where you are expected to take extreme ownership of ambiguous problems and drive them directly into production.
  • Obsessed with building AI systems that natively understand and generate speech, ultimately creating a hardware-software AI companion that amplifies human productivity.

Nice To Haves

  • Text-based LLMs: Hands-on experience with core text-based Large Language Model pretraining, instruction tuning, or RLHF.
  • Neural Audio Codecs: Hands-on experience designing and training state-of-the-art neural audio codecs for streamable, high-fidelity audio.
  • Generative Architectures: Designing and training diffusion models, flow matching, or autoregressive architectures specifically for speech and voice generation.
  • Alignment & Steerability: Applying Reinforcement Learning (RL) techniques (like RLHF or GRPO) to improve conversational cadence, steerability, and alignment in foundation models.
  • Deep System Optimization: End-to-end inference and performance optimization, leveraging high-throughput serving frameworks (e.g., vLLM, TensorRT-LLM, SGLang) to minimize latency for real-time cloud streaming.
  • Large-Scale Infrastructure: Managing massive GPU clusters, utilizing advanced distributed training frameworks (e.g., FSDP, DeepSpeed), and navigating orchestration tools like Kubernetes.

Responsibilities

  • Building and training large-scale audio or speech models from the ground up, including unified SpeechLLMs, advanced ASR, expressive TTS, or generative audio architectures.
  • Designing novel sequence modeling architectures.
  • Debugging distributed training clusters.
  • Traversing the entire stack from fundamental signal processing and raw acoustic representations to massive foundation model training and edge-device optimization.
  • Optimizing large-scale distributed training runs.
  • Managing GPU memory utilization.
  • Resolving complex performance bottlenecks.
  • Taking extreme ownership of ambiguous problems and driving them directly into production.
  • Building AI systems that natively understand and generate speech, ultimately creating a hardware-software AI companion that amplifies human productivity.
  • Hands-on experience with core text-based Large Language Model pretraining, instruction tuning, or RLHF.
  • Hands-on experience designing and training state-of-the-art neural audio codecs for streamable, high-fidelity audio.
  • Designing and training diffusion models, flow matching, or autoregressive architectures specifically for speech and voice generation.
  • Applying Reinforcement Learning (RL) techniques (like RLHF or GRPO) to improve conversational cadence, steerability, and alignment in foundation models.
  • End-to-end inference and performance optimization, leveraging high-throughput serving frameworks (e.g., vLLM, TensorRT-LLM, SGLang) to minimize latency for real-time cloud streaming.
  • Managing massive GPU clusters, utilizing advanced distributed training frameworks (e.g., FSDP, DeepSpeed), and navigating orchestration tools like Kubernetes.

Benefits

  • Opportunity to be an early, foundational member of our core SpeechLLM lab, with meaningful ownership and impact on a fast-growing startup.
  • $180K - $270K base salary + performance bonus + Equity.
  • Top-tier healthcare for employees and dependents, including dental and vision, and a generous employer subsidy.
  • 401(k) plan for full-time employees with company matching.
  • Unlimited PTO, plus 13 paid holidays.
  • 12 weeks of paid time off to spend time with your new family, regardless of gender.
  • Choice of top-of-the-line laptops/workstations, annual offsites, and a fully stocked office.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service