About The Position

A1 is building a proactive AI system that understands context across conversations, plans actions, and carries work forward over time. You will be responsible for turning research direction into working, production-grade ML systems. This role owns the execution layer of A1’s intelligence – training pipelines, inference systems, evaluation tooling, and deployment.

Requirements

  • Strong background in deep learning and transformer-based architectures.
  • Hands-on experience training, fine-tuning, or deploying large-scale ML models in production.
  • Proficiency with at least one modern ML framework (e.g. PyTorch, JAX), and ability to learn others quickly.
  • Experience with distributed training and inference frameworks (e.g. DeepSpeed, FSDP, Megatron, ZeRO, Ray).
  • Strong software engineering fundamentals – you write robust, maintainable, production-grade systems.
  • Experience with GPU optimization, including memory efficiency, quantization, and mixed precision.
  • Comfort owning ambiguous, zero-to-one ML systems end-to-end.
  • A bias toward shipping, learning fast, and improving systems through iteration.

Nice To Haves

  • Experience with LLM inference frameworks such as vLLM, TensorRT-LLM, or FasterTransformer.
  • Contributions to open-source ML or systems libraries.
  • Background in scientific computing, compilers, or GPU kernels.
  • Experience with RLHF pipelines (PPO, DPO, ORPO).
  • Experience training or deploying multimodal or diffusion models.
  • Experience with large-scale data processing (Apache Arrow, Spark, Ray).

Responsibilities

  • Build and own end-to-end ML pipelines spanning data, training, evaluation, inference, and deployment.
  • Fine-tune and adapt models using state-of-the-art methods such as LoRA, QLoRA, SFT, DPO, and distillation.
  • Architect and operate scalable inference systems, balancing latency, cost, and reliability.
  • Design and maintain data systems for high-quality synthetic and real-world training data.
  • Implement evaluation pipelines covering performance, robustness, safety, and bias, in partnership with research leadership.
  • Own production deployment, including GPU optimization, memory efficiency, latency reduction, and scaling policies.
  • Collaborate closely with application engineering to integrate ML systems cleanly into backend, mobile, and desktop products.
  • Make pragmatic trade-offs and ship improvements quickly, learning from real usage.
  • Work under real production constraints: latency, cost, reliability, and safety
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service