Member of Technical Staff, RL Training

Inception•San Francisco, CA

About The Position

Inception creates the world’s fastest, most efficient AI models. Our Mercury model is the world’s fastest reasoning LLM and first commercially available diffusion LLM, delivering 5x greater speed and efficiency than today’s LLMs, with best-in-class quality. We are the AI researchers and engineers behind such breakthrough AI technologies as diffusion models, flash attention, and DPO. We are looking for Research Scientists / Engineers with deep expertise in post-training large language models. In this role, you will work on advancing our diffusion-based LLM architecture, developing novel training techniques, and pushing the boundaries of what's possible with parallel token generation.

Requirements

BS/MS/PhD in Computer Science or a related field (or equivalent experience).
At least 2 years of experience working on ML projects in PyTorch (or equivalent DL framework), preferably in a research lab or engineering role.
Excellent familiarity with transformers and fundamental LLM concepts (e.g., autoregressive pretraining, instruction tuning, in-context learning, and KV caching).
Familiarity with training and inference in diffusion models.
Experience with training deep learning models at scale using distributed computing environments.

Nice To Haves

Extensive experience training transformer-based language models from scratch
Knowledge of advanced training techniques (mixed precision, gradient accumulation, etc.)
Experience with multi-modal learning and cross-modal architectures
Background in optimization theory and neural network architecture design
Experience with LLMs serving frameworks like vLLM, SGLang, or TensorRT.

Responsibilities

Design, develop, and optimize LLM architectures and models
Implement innovative approaches for fine-tuning, and scaling generative AI models.
Work on data preprocessing pipelines, model evaluation, and alignment to enterprise use cases.
Develop and optimize training objectives and loss functions for LLMs
Research and implement techniques for controlled text generation and constraint satisfaction
Develop methods for multi-modal integration within the diffusion framework
Work on improving model efficiency, reducing training time, and optimizing inference