Member of Technical Staff - Efficient ML

MoonlakeSan Mateo, CA
6dOnsite

About The Position

Introducing Moonlake, AI for creating world simulations. Scope of Work Training efficiency Dataloaders, fusion, activation remat, gradient checkpointing. FSDP/ZeRO/tensor+pipeline parallel; NCCL tuning. GPU + kernel performance Nsight profiling, Triton/CUDA kernels, fused ops. Flash-attention–style speedups, sequence packing, KV-cache tricks. Inference optimization Low-latency serving, continuous batching, speculative decoding. Quantization (GPTQ/AWQ), distillation, pruning. Infra + reliability SLURM/K8s multi-node jobs, checkpoint hygiene. Determinism, env pinning, GPU failure handling. We are committed to being an on-site, in-person team currently based in San Mateo

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service