ML Engineer (Intern)

Pathos•New York, NY

16d•Hybrid

About The Position

We are hiring Machine Learning Engineer Interns. You will work alongside senior researchers and engineers on high-impact projects spanning: Hyper-scale training & inference infrastructure, Pre-training & Post-training of multi-modal foundational models, Knowledge Graph (KG) & Retrieval Augmented Generation (RAG), Evaluation of reasoning capabilities (logic, metric design, dataset curation). This role is ideal for candidates who want to operate at the intersection of frontier machine learning and real-world, high-stakes research and production systems.

Requirements

Strong programming ability in Python
Solid fundamentals in machine learning / deep learning through coursework, research, internships, or substantial projects
Experience with PyTorch and modern training workflows
Comfort operating in ambiguous problem spaces with a bias toward execution

Nice To Haves

Experience with distributed systems (e.g., multi-node training, large-scale data loaders, cluster scheduling)
Familiarity with performance optimization (profiling, kernel efficiency, GPU utilization, throughput/latency)
Research experience (papers, preprints, open-source contributions, or significant independent work)
Exposure to biomedical, clinical, or multimodal datasets (helpful but not required)

Responsibilities

Use Nsight to profile and analyze post-training pipeline, identify process that dominates wall-clock time (rollout GEMM vs KV cache I/O vs weight reloading vs reward compute)
Design and prototype an NCCL-based weight broadcast path that streams updated LoRA (and, optionally, full base) weights directly into inference engine’s GPU memory
Improve hyper-scale training throughput and efficiency by investigating sharding granularity, mixed-precision policy, communication overlap, gradient bucketing, etc.
Deep dive into Mixture-of-Experts training strategies, study how to layout tensor, expert, and data parallel groups on H200 with InfiniBand island. Token vs sequence level routing
Design strategies to maintain training stability and load balancing, including aux-loss design, capacity factor, drop/pad policies, router z-loss, expert dropout.
Experiment and derive best practice for SFT and RL on top of a pre-trained MoE, router freezing, gradient flow concerns
Develop prefill/decode disaggregation serving to decouple long-prompt prefill cost from autoregressive decode loop, deep dive into node replacement, KV cache transfer over NVlink/InfiniBand, scheduling policy, and how to balance pools as load mixes shifts.

Benefits

Hands-on experience on thousand scale GPUs infrastructure
Full cycle multi-modal foundational model training, from Pre-training to Post-training
Opportunities to publish in top-tier venues such as NeurIPS, ACL, and ICML
Competitive compensation, strong candidates will be considered for full-time roles

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume