AI Platform Engineer

fastino.aiSan Francisco, CA
1dHybrid

About The Position

Join us at Fastino as we build the next generation of LLMs. Our team, boasting alumni from Google Research, Apple, Stanford, and Cambridge is on a mission to develop specialized, efficient AI. Fastino's GLiNER family of open source models has been downloaded more than 5 million times and is used by companies such as NVIDIA, Meta, and Airbnb Fastino has raised $25M (as featured in TechCrunch) through our seed round and is backed by leading investors including Microsoft, Khosla Ventures, Insight Partners, Github CEO Thomas Dohmke, Docker CEO Scott Johnston, and others. We are looking for a systems-level engineer to own Fastino’s model platform end-to-end. This is not a feature role. You will own the platform that turns models into production systems.

Requirements

  • Deep experience with PyTorch and transformer architectures
  • Experience building production ML systems end-to-end
  • Experience with distributed training and inference
  • Experience optimizing GPU workloads
  • Strong backend and systems engineering fundamentals
  • Experience with containerization and orchestration
  • Cloud infrastructure experience (AWS/GCP/Modal/Together.ai etc)

Nice To Haves

  • Experience with RL or RLHF
  • Experience with distillation and compression
  • Experience building internal ML platforms

Responsibilities

  • Design and build training pipelines
  • Design and build fine-tuning workflows
  • Design and build RL infrastructure
  • Design and build data ingestion and curation systems
  • Design and build inference services
  • Design and build scalability and backend architecture
  • Architect distributed fine-tuning pipelines for small encoder and decoder models
  • Implement LoRA, adapters, distillation, and compression workflows
  • Design experiment tracking, reproducibility, and dataset versioning systems
  • Optimize training efficiency (GPU utilization, memory, throughput, cost)
  • Design scalable RL training workflows (policy optimization, reward modeling)
  • Integrate RL with supervised fine-tuning and distillation
  • Build evaluation loops and automated regression detection
  • Build scalable ingestion pipelines for structured and unstructured data
  • Design dataset curation, filtering, and quality enforcement systems
  • Implement reproducible data workflows tied to training runs
  • Architect low-latency inference services
  • Design safe production deployment workflows
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service