Voice AI Systems Engineer

KnownSan Francisco, CA
9d$225,000 - $330,000Onsite

About The Position

Known is a matchmaker that talks to users and supports them like a friend. Our mission is to empower humanity by applying general intelligence to human connection. Users join Known by telling us their life story. On average, our new users talk to our AI voice agent for 27 minutes , giving us a uniquely intimate multi-modal data set. We are a team of engineers who’ve created some of the most widely used AI-driven consumer products including Uber Eats , Uber, Faire and Afterpay . We love to work hard, with a high degree of autonomy and ownership. We work together in Cow Hollow, San Francisco. We’re looking for founding voice AI systems engineers to build and scale Known’s core voice systems architecture, powering our voice-led onboarding and user experiences. This is a unique opportunity to work with a hyper-personalized data-set, combining voice transcripts, images, and structured user data to empower real-time, personalized AI voice-led conversations at scale. You’ll work directly with Chen Peng, former head of ML at Uber Eats and Faire.

Requirements

  • 3-5 Years in ML/Systems: Proven experience deploying high-scale models in production, specifically focusing on audio processing or real-time streaming.
  • The Voice Stack: Deep familiarity with modern STT/TTS frameworks (e.g., ElevenLabs, LiveKit, VITS and Sesame) and audio libraries like Librosa or FFmpeg.
  • Agentic Conversational AI: Experience building "brain" logic for LLMs using tools like LangGraph or Haystack to manage complex, non-linear dialogue.
  • Production Hardened: You’ve optimized model inference for speed using TensorRT, ONNX, or Triton, and you’re comfortable in a Docker/Kubernetes/Cloud environment.

Responsibilities

  • Low-Latency Orchestration: Architecting the real-time pipeline between STT (Speech-to-Text), LLM reasoning, and TTS (Text-to-Speech) to ensure conversational fluidness (<500ms response times).
  • Voice Personalization & Memory: Building systems that allow our AI to remember not just what a user said, but how they said it, incorporating tone and sentiment into long-term user profiles.
  • Audio Intelligence: Implementing and fine-tuning Voice Activity Detection (VAD) and interrupt-handling logic so the AI feels responsive, empathetic, and polite during the onboarding interview.
  • Streaming Infrastructure: Maintaining robust WebRTC or WebSocket-based systems to handle high-concurrency voice streams while maintaining audio fidelity.
  • Evals for Voice: Developing custom evaluation frameworks to measure "conversational success," going beyond word error rate (WER) to assess personality, warmth, and engagement.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service