About The Position

We are seeking a Machine Learning Engineer to lead the fine-tuning, optimization, and deployment of AI models for diverse tasks, with a strong emphasis on on-device inference . You will work on cutting-edge applications such as orchestration, planning, multi-agent coordination , and other intelligent decision-making systems. You will be responsible for adapting foundation models (LLMs, multimodal models) to specialized domains, making them fast, accurate, and efficient for resource-constrained environments—while ensuring robustness and safety.

Requirements

  • 5+ years of experience in applied machine learning, including at least 3 years in LLM fine-tuning.
  • Proficiency in Python and ML frameworks ecosystem (HuggingFace, PyTorch).
  • Strong understanding of transformer architectures, attention mechanisms, and PEFT techniques.
  • Experience with on-device inference optimization (OpenVINO, ONNX, QNN).
  • Familiarity with orchestration/planning architectures and techniques for AI assistants.
  • Track record of delivering production-ready ML solutions in latency-sensitive environments.

Nice To Haves

  • Experience with multi-agent systems or AI assistant orchestration.
  • Familiarity with advanced inference optimization techniques such as KV cache paging , flash attention.
  • Knowledge about common inference engines, including but not limited to llama.cpp, vLLM.

Responsibilities

  • Model Fine-Tuning & Adaptation Fine-tune large language models, multimodal models, and task-specific models for orchestration, planning, and any other workflows as defined.
  • Design and run experiments to improve task accuracy, robustness, and generalization.
  • Explore and apply methods like full fine-tuning, LoRA, QLoRA and other types of parameter-efficient fine-tuning.
  • Employee advanced techniques such as QAT, DPO, GRPO to further improve the model quality.
  • On-Device Optimization Prune, quantize and compress models (e.g., INT8, INT4, mixed-precision) for CPU, GPU, NPU and edge accelerators.
  • Optimize models for low-latency inference using frameworks like OpenVINO, ONNX Runtime, QNN etc..
  • Data Pipeline & Deployment Build robust data pipelines for domain-specific datasets, including synthetic data generation and annotation.
  • Define evaluation metrics. Perform evaluations and analyze results.
  • Establish best practices for versioning, reproducibility, and continuous improvement of model performance.
  • AI Orchestration & Planning Develop and refine models to support multi-step reasoning, tool orchestration, and decision planning.
  • Work with stakeholders on orchestrator architecture.
  • Collaborate with product and research teams to design intelligent, context-aware assistant capabilities.

Benefits

  • Health insurance
  • Dental insurance
  • Vision insurance
  • Long term/short term disability insurance
  • Employee assistance program
  • Flexible spending account
  • Life insurance
  • Generous time off policies, including; 4-12 weeks fully paid parental leave based on tenure
  • 11 paid holidays
  • Additional flexible paid vacation and sick leave
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service