About The Position

Innodata is expanding its team of technical experts in LLM training, post-training, and evaluation systems. As an AI/ML Research Engineer, LLM Training & Evaluation, you will build and optimize the technical foundations that power model improvement for foundation model builders and leading labs. This role is ideal for someone who has hands-on experience fine-tuning and evaluating large language models (and ideally multimodal models), and who can bridge research and engineering in real-world customer environments. You will work closely with Language Data Scientists, Applied Research Scientists, data engineers, and client technical stakeholders to design and implement robust training/evaluation pipelines using both human-in-the-loop and AI-augmented methods. The ideal candidate brings a strong computer science / machine learning engineering background, experience with modern LLM post-training workflows, and the ability to engage credibly with technical counterparts at leading AI organizations.

Requirements

  • BS/MS/PhD in Computer Science, Machine Learning, AI, Applied Mathematics, or a related quantitative technical field (MS/PhD preferred)
  • 2-3 years of relevant industry or research engineering experience in ML/AI systems
  • Hands-on experience with LLM training / fine-tuning / post-training, including at least one of: supervised fine-tuning (SFT), preference optimization (e.g., DPO or related methods), RLHF / RLAIF-style workflows, task- or domain-adaptation of foundation models
  • Strong programming skills in Python and experience building production-quality ML code
  • Experience with modern ML frameworks (e.g., PyTorch, JAX, TensorFlow) and model libraries/tooling (e.g., Hugging Face ecosystem, vLLM, distributed training stacks)
  • Experience designing and implementing evaluation pipelines for LLM/ML systems, including metrics computation, dataset handling, and experiment comparisons
  • Strong understanding of data pipelines and ML systems engineering, including reproducibility, observability, and debugging
  • Experience with large-scale distributed ML systems and performance optimization for training/evaluation workloads (GPU/accelerator environments preferred)
  • Experience with large-scale data processing and workflow orchestration in support of model training/evaluation
  • Ability to collaborate directly with technical stakeholders including research scientists, ML engineers, data engineers, and customer technical leads
  • Strong written and verbal communication skills, including the ability to explain complex technical tradeoffs to both technical and non-technical audiences
  • Experience training, fine-tuning, and evaluating transformer-based models
  • Understanding of post-training workflows and model iteration loops
  • Familiarity with inference-time considerations (latency, throughput, memory/performance tradeoffs) where relevant to evaluation or deployment
  • Experience implementing automated evaluation pipelines and test harnesses
  • Experience with experiment tracking, versioning, and reproducibility practices
  • Ability to assess metric quality and ensure consistency across model comparisons
  • Proficiency in Python and strong software engineering fundamentals
  • Experience with data processing pipelines, storage formats, and scalable dataset workflows
  • Familiarity with CI/CD, testing, and engineering quality practices for ML systems

Responsibilities

  • Design and implement the pipelines and tooling that connect data, evaluation, and post-training.
  • Help customers and internal teams move from evaluation findings to measurable model improvements.
  • Build fine-tuning workflows (e.g., supervised fine-tuning and preference-based optimization).
  • Integrate evaluation harnesses into model development loops.
  • Improve experiment reliability and throughput.
  • Support advanced evaluation scenarios such as long-context, cross-modal, and dynamic multi-turn interactions.
  • Contribute to Innodata’s internal R&D efforts, including benchmark datasets, evaluation frameworks, and reusable infrastructure for model assessment and post-training experimentation.
  • Lead or co-lead technically complex ML engineering projects from initial customer discussions through implementation and delivery.
  • Design, build, and improve LLM training and post-training pipelines, including data ingestion, preprocessing, fine-tuning, evaluation, and experiment tracking.
  • Implement and optimize evaluation systems for LLMs and multimodal models, including offline benchmarks and task-specific test harnesses.
  • Integrate human-in-the-loop and AI-augmented evaluation signals into model development workflows.
  • Build robust infrastructure and tooling for reproducible experimentation, metrics logging, and regression monitoring.
  • Diagnose model behavior and pipeline failures, including data issues, training instability, metric inconsistencies, and evaluation drift.
  • Collaborate with Language Data Scientists and Applied Research Scientists to translate evaluation frameworks into executable systems.
  • Work closely with customer technical stakeholders to understand goals, constraints, and success criteria; propose and implement technically sound solutions.
  • Contribute to internal research and platform development, including benchmark frameworks, evaluation tooling, and post-training workflow improvements.
  • Contribute to best practices and standards for LLM training, evaluation, and quality assurance across projects.
  • Mentor junior engineers and contribute to technical design reviews, documentation, and engineering rigor across the team.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service