Machine Learning Engineer

SciforiumSan Francisco, CA
1d

About The Position

As a Research Engineer, you’ll work across the full foundation-model stack: pretraining and scaling, post-training and Reinforcement Learning, sandbox environments for evaluation and agentic learning, and deployment + inference optimization. You’ll build and iterate quickly on research ideas, contribute production-grade infrastructure, and help deliver models that can serve real-world use cases at scale.

Requirements

  • Strong general software engineering skills (writing robust, performant systems)
  • Experience with training or serving large neural networks (LLMs or similar)
  • Solid grasp of deep learning fundamentals and modern literature
  • Comfort working in high-performance environments (GPU, distributed systems, etc.)

Nice To Haves

  • Pretraining / large-scale distributed training (FSDP/ZeRO/Megatron-style systems)
  • Post-training pipelines (SFT, RLHF/RLAIF, preference optimization, eval loops)
  • Building RL environments, simulators, or agent frameworks
  • Inference optimization, model compression, quantization, kernel-level profiling
  • Building large ETL pipelines for internet-scale data ingestion and cleaning
  • Owning end-to-end production ML systems with monitoring and reliability
  • Ability to propose and evaluate research ideas quickly
  • Strong experimental hygiene: ablations, metrics, reproducibility, analysis
  • Bias toward building — you can turn ideas into working code and results

Responsibilities

  • Train large byte-native foundation models across massive, heterogeneous corpora
  • Design stable training recipes and scaling laws for novel architectures
  • Improve throughput, memory efficiency, and utilization on large GPU clusters
  • Build and maintain distributed training infrastructure and fault-tolerant pipelines
  • Develop post-training pipelines (SFT, preference optimization, RLHF/RLAIF, RL)
  • Curate and generate targeted datasets to improve specific model capabilities
  • Build reward models and evaluation frameworks to drive iterative improvement
  • Explore inference-time learning and compute techniques to enhance performance
  • Build scalable sandbox environments for agent evaluation and learning
  • Create realistic, high-signal automated evals for reasoning, tool use, and safety
  • Design offline + online environments that support RL-style training at scale
  • Instrument environments for observability, reproducibility, and iteration speed
  • Optimize inference throughput/latency for byte-native architectures
  • Build high-performance serving pipelines (KV caching, batching, quantization, etc.)
  • Improve end-to-end model efficiency, cost, and reliability in production
  • Profile and optimize GPU kernels, runtime bottlenecks, and memory behavior

Benefits

  • Medical, dental, and vision insurance
  • 401k plan
  • Daily lunch, snacks, and beverages
  • Flexible time off
  • Competitive salary and equity
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service