ML Engineer (Intern)

PathosNew York, NY
Hybrid

About The Position

We are hiring Machine Learning Engineer Interns. You will work alongside senior researchers and engineers on high-impact projects spanning: Hyper-scale training & inference infrastructure, Pre-training & Post-training of multi-modal foundational models, Knowledge Graph (KG) & Retrieval Augmented Generation (RAG), Evaluation of reasoning capabilities (logic, metric design, dataset curation). This role is ideal for candidates who want to operate at the intersection of frontier machine learning and real-world, high-stakes research and production systems.

Requirements

  • Strong programming ability in Python
  • Solid fundamentals in machine learning / deep learning through coursework, research, internships, or substantial projects
  • Experience with PyTorch and modern training workflows
  • Comfort operating in ambiguous problem spaces with a bias toward execution

Nice To Haves

  • Experience with distributed systems (e.g., multi-node training, large-scale data loaders, cluster scheduling)
  • Familiarity with performance optimization (profiling, kernel efficiency, GPU utilization, throughput/latency)
  • Research experience (papers, preprints, open-source contributions, or significant independent work)
  • Exposure to biomedical, clinical, or multimodal datasets (helpful but not required)

Responsibilities

  • Use Nsight to profile and analyze post-training pipeline, identify process that dominates wall-clock time (rollout GEMM vs KV cache I/O vs weight reloading vs reward compute)
  • Design and prototype an NCCL-based weight broadcast path that streams updated LoRA (and, optionally, full base) weights directly into inference engine’s GPU memory
  • Improve hyper-scale training throughput and efficiency by investigating sharding granularity, mixed-precision policy, communication overlap, gradient bucketing, etc.
  • Deep dive into Mixture-of-Experts training strategies, study how to layout tensor, expert, and data parallel groups on H200 with InfiniBand island. Token vs sequence level routing
  • Design strategies to maintain training stability and load balancing, including aux-loss design, capacity factor, drop/pad policies, router z-loss, expert dropout.
  • Experiment and derive best practice for SFT and RL on top of a pre-trained MoE, router freezing, gradient flow concerns
  • Develop prefill/decode disaggregation serving to decouple long-prompt prefill cost from autoregressive decode loop, deep dive into node replacement, KV cache transfer over NVlink/InfiniBand, scheduling policy, and how to balance pools as load mixes shifts.

Benefits

  • Hands-on experience on thousand scale GPUs infrastructure
  • Full cycle multi-modal foundational model training, from Pre-training to Post-training
  • Opportunities to publish in top-tier venues such as NeurIPS, ACL, and ICML
  • Competitive compensation, strong candidates will be considered for full-time roles
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service