Voice AI Systems Engineer

KnownSan Francisco, CA
7d$225,000 - $330,000Onsite

About The Position

We’re looking for founding voice AI systems engineers to build and scale Known’s core voice systems architecture, powering our voice-led onboarding and user experiences. This is a unique opportunity to work with a hyper-personalized data-set, combining voice transcripts, images, and structured user data to empower real-time, personalized AI voice-led conversations at scale. You’ll work directly with Chen Peng, former head of ML at Uber Eats and Faire.

Requirements

  • 3-5 Years in ML/Systems: Proven experience deploying high-scale models in production, specifically focusing on audio processing or real-time streaming.
  • The Voice Stack: Deep familiarity with modern STT/TTS frameworks (e.g., ElevenLabs, LiveKit, VITS and Sesame) and audio libraries like Librosa or FFmpeg.
  • Agentic Conversational AI: Experience building "brain" logic for LLMs using tools like LangGraph or Haystack to manage complex, non-linear dialogue.
  • Production Hardened: You’ve optimized model inference for speed using TensorRT, ONNX, or Triton, and you’re comfortable in a Docker/Kubernetes/Cloud environment.

Responsibilities

  • Low-Latency Orchestration: Architecting the real-time pipeline between STT (Speech-to-Text), LLM reasoning, and TTS (Text-to-Speech) to ensure conversational fluidness (<500ms response times).
  • Voice Personalization & Memory: Building systems that allow our AI to remember not just what a user said, but how they said it, incorporating tone and sentiment into long-term user profiles.
  • Audio Intelligence: Implementing and fine-tuning Voice Activity Detection (VAD) and interrupt-handling logic so the AI feels responsive, empathetic, and polite during the onboarding interview.
  • Streaming Infrastructure: Maintaining robust WebRTC or WebSocket-based systems to handle high-concurrency voice streams while maintaining audio fidelity.
  • Evals for Voice: Developing custom evaluation frameworks to measure "conversational success," going beyond word error rate (WER) to assess personality, warmth, and engagement.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service