Research Engineer (Machine Translation)

Sanas•Palo Alto, CA

87d

About The Position

Language Translation is one of Sanas's most exciting and fastest-growing product lines. We're looking for a Research Engineer who can both set technical direction and get deep in the modeling work — someone who owns translation quality end-to-end across language pairs and drives the fundamental research challenges unique to real-time simultaneous interpretation.

Requirements

3+ years of experience in machine translation, NLP, or multilingual modeling research — with a track record of measurable quality improvements in production systems.
Deep familiarity with neural MT architectures: sequence-to-sequence models, Transformer variants, and large multilingual models.
Hands-on experience with simultaneous or streaming translation, including segmentation and low-latency decoding strategies.
Strong command of MT evaluation methodology — automated metrics, human evaluation design, and error analysis.
Proficiency in Python and deep learning frameworks (PyTorch preferred)
Demonstrated ability to set a research agenda, execute independently, and communicate findings clearly to technical and non-technical stakeholders.

Nice To Haves

Fluency in English plus working proficiency in at least one non-English language is a strong plus.
Experience with speech translation (end-to-end or cascaded) and speech-aware MT pipelines.
Familiarity with on-device or edge-optimized model deployment for low-latency inference.
Prior work on low-resource language pairs, domain adaptation, or terminology-constrained translation.
Published research at ACL, EMNLP, NAACL, INTERSPEECH, or equivalent venues.

Responsibilities

Own and drive improvements to translation accuracy across Sanas's supported language pairs, with a focus on conversational, spoken-language domains.
Design, train, and evaluate neural MT models — from fine-tuning large multilingual models to building targeted components for low-resource or high-priority language pairs.
Develop and maintain rigorous evaluation pipelines using both automated metrics (BLEU, COMET, chrF) and human evaluation frameworks calibrated to real-world enterprise use cases.
Identify the highest-leverage research bets — data augmentation, domain adaptation, quality estimation, terminology consistency — and execute on them with measurable quality gains.
Lead research and development of Sanas's delimiter model — the component that determines optimal segmentation points in streaming speech for real-time translation output.
Develop methods to handle speech disfluencies, sentence fragments, and incomplete utterances gracefully in a streaming translation pipeline.
Collaborate closely with the speech and inference engineering teams to ensure translation components meet strict real-time latency budgets in production.
Define and maintain a research roadmap for MT and simultaneous interpretation, prioritizing work that moves production quality metrics.
Stay at the frontier of MT research — track and evaluate relevant work — and translate relevant advances into practical improvements at Sanas.
Mentor and technically guide other engineers working on translation-adjacent problems across the ML org.
Identify, source, and curate training data for MT and delimiter modeling — including parallel corpora, synthetic data generation, and speech-aware augmentation strategies.
Instrument model quality monitoring in production to detect degradation across language pairs and trigger targeted retraining cycles.