Member of Technical Staff, Pre-Training

Inception•San Francisco, CA

About The Position

The Role We seek experienced scientists and engineers with deep expertise in pre- and mid-training large language models. You will advance our diffusion-based LLM models, developing novel training techniques and pushing the boundaries of parallel token generation.

Requirements

BS/MS/PhD in Computer Science or a related field (or equivalent experience).
At least 2 years of experience working on ML projects in PyTorch (or equivalent), preferably in a research lab or engineering role.
Excellent familiarity with transformers and core LLM concepts (autoregressive pretraining, instruction tuning, in-context learning, KV caching).
Familiarity with training and inference in diffusion models.
Experience training deep learning models at scale in distributed computing environments.

Nice To Haves

Extensive experience training transformer-based language models from scratch.
Knowledge of advanced training techniques (mixed precision, gradient accumulation, etc.).
Experience with multi-modal learning and cross-modal architectures.
Background in optimization theory and neural network architecture design.
Experience with LLM serving frameworks like vLLM, SGLang, or TensorRT.

Responsibilities

Design, develop, and optimize architectures for diffusion-based language models.
Implement innovative training objectives and loss functions for discrete diffusion LLMs.
Research and implement techniques for controlled text generation and constraint satisfaction.
Develop methods for multi-modal integration within the diffusion framework.
Improve model efficiency, reduce training time, and optimize inference throughput.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume