Audio Solutions Architect

Innodata Inc.
$150,000 - $230,000Hybrid

About The Position

Innodata builds the high-quality voice and audio datasets that power the world's leading speech AI — text-to-speech, speech recognition, and the new generation of speech-to-speech and conversational voice models. We're hiring an Audio Solutions Architect to be both the technical partner to our customers in presales and the external technical voice of our audio practice. This is a hybrid role with two equally weighted halves. In presales, you sit with a frontier lab or enterprise team, understand what they're trying to train, and shape the data collection program that gets them there. In thought leadership, you keep us at the frontier of speech AI — producing go-to-market research and content, speaking at conferences, and establishing Innodata as the most technically credible audio data partner in the market. The two reinforce each other.

Requirements

  • Deep working knowledge of speech/audio AI: how TTS, ASR, and speech-to-speech systems are trained and evaluated, and what data they require.
  • Experience in a solutions engineering, solutions architect, technical presales, or applied/forward-deployed role — or a technical audio/speech background plus strong commercial instincts.
  • Demonstrated ability (and appetite) to produce public-facing technical content and represent a company externally — writing, speaking, or community engagement.
  • Ability to shape ambiguous requirements into precise specs and communicate them to both researchers and business stakeholders.
  • Strong presence and persuasion; comfortable being the technical authority in a sales conversation and on a conference stage.
  • Familiarity with audio technical specifications (sample rates, LUFS, formats), transcript/metadata schemas, and quality metrics (WER, DER).
  • A public body of work in speech/audio: talks, papers, blog posts, benchmarks.
  • Hands-on experience with speech datasets, annotation, or audio production.
  • Background working with or at a frontier AI lab or voice-AI product company.
  • Multilingual / localization exposure.

Responsibilities

  • Partner with customers in presales to understand their model objectives, current data gaps, and technical constraints.
  • Shape requirements: define acoustic specs, language/accent coverage, speaker demographics, emotional/paralinguistic range, transcript and metadata schema, and QA targets (WER/DER, LUFS, etc.).
  • Translate requirements into scoped execution plans — volumes, timelines, methodology, pricing inputs — in partnership with delivery.
  • Serve as the credible technical voice in the room: explain tradeoffs (studio vs. real-world vs. telephonic, scripted vs. spontaneous, single vs. multi-speaker) and defend methodology choices.
  • Build reusable solutioning assets: scoping frameworks, spec templates, reference architectures for common audio data use cases.
  • Stay at the tip of the spear on speech-AI developments (TTS, ASR, speech-to-speech) and what data the next generation of models will need.
  • Produce go-to-market material: technical blog posts, white papers, benchmark reports, and reference content that demonstrates Innodata's depth.
  • Represent Innodata externally: speak at and work conferences (Interspeech, ICASSP, industry events), engage the speech-AI community, and build our public technical profile.
  • Feed market intelligence back into strategy — advise on emerging data categories and where to invest ahead of demand.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service