About The Position

Language Translation is one of Sanas's most exciting and fastest-growing product lines. We're looking for a Research Engineer who can both set technical direction and get deep in the modeling work — someone who owns translation quality end-to-end across language pairs and drives the fundamental research challenges unique to real-time simultaneous interpretation.

Requirements

  • 3+ years of experience in machine translation, NLP, or multilingual modeling research — with a track record of measurable quality improvements in production systems.
  • Deep familiarity with neural MT architectures: sequence-to-sequence models, Transformer variants, and large multilingual models.
  • Hands-on experience with simultaneous or streaming translation, including segmentation and low-latency decoding strategies.
  • Strong command of MT evaluation methodology — automated metrics, human evaluation design, and error analysis.
  • Proficiency in Python and deep learning frameworks (PyTorch preferred)
  • Demonstrated ability to set a research agenda, execute independently, and communicate findings clearly to technical and non-technical stakeholders.
  • Fluency in English plus working proficiency in at least one non-English language is a strong plus.

Nice To Haves

  • Experience with speech translation (end-to-end or cascaded) and speech-aware MT pipelines.
  • Familiarity with on-device or edge-optimized model deployment for low-latency inference.
  • Prior work on low-resource language pairs, domain adaptation, or terminology-constrained translation.
  • Published research at ACL, EMNLP, NAACL, INTERSPEECH, or equivalent venues.

Responsibilities

  • Own and drive improvements to translation accuracy across Sanas's supported language pairs, with a focus on conversational, spoken-language domains.
  • Design, train, and evaluate neural MT models — from fine-tuning large multilingual models to building targeted components for low-resource or high-priority language pairs.
  • Develop and maintain rigorous evaluation pipelines using both automated metrics (BLEU, COMET, chrF) and human evaluation frameworks calibrated to real-world enterprise use cases.
  • Identify the highest-leverage research bets — data augmentation, domain adaptation, quality estimation, terminology consistency — and execute on them with measurable quality gains.
  • Lead research and development of Sanas's delimiter model — the component that determines optimal segmentation points in streaming speech for real-time translation output.
  • Develop methods to handle speech disfluencies, sentence fragments, and incomplete utterances gracefully in a streaming translation pipeline.
  • Collaborate closely with the speech and inference engineering teams to ensure translation components meet strict real-time latency budgets in production.
  • Define and maintain a research roadmap for MT and simultaneous interpretation, prioritizing work that moves production quality metrics.
  • Stay at the frontier of MT research — track and evaluate relevant work — and translate (haha) relevant advances into practical improvements at Sanas.
  • Mentor and technically guide other engineers working on translation-adjacent problems across the ML org.
  • Identify, source, and curate training data for MT and delimiter modeling — including parallel corpora, synthetic data generation, and speech-aware augmentation strategies.
  • Instrument model quality monitoring in production to detect degradation across language pairs and trigger targeted retraining cycles.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service