Member of Technical Staff, Image Generation - Agent, RL

xAI•Palo Alto, CA

About The Position

The omni team at xAI creates magical AI experiences beyond text, enabling understanding and generation of content across various modalities, including image, video, and audio. We are building next-generation multimodal agents that reason, act, and communicate through images as naturally as text. In many real-world scenarios, images are a more intuitive and efficient way to convey information than language alone. This role focuses on advancing image generation models with agentic behavior and reinforcement learning, enabling higher factuality, controllability, and seamless integration into multimodal conversational systems. You will work on core research and production systems that allow models to decide when and how to generate images, improving Grok multimodal responses where images meaningfully augment or replace text.

Requirements

Track record in leading studies that significantly improve the capability and performance of neural networks, whether through better data or better modeling.
Experience in data-driven experiment designs and systematic analysis for iterative model debugging.
Experience in SFT, RL, evals, and human/synthetic data.
Experience in agentic RL training models is considered an advantage.

Responsibilities

Developing agentic planners for image generation.
Designing and collecting human/synthetic data; developing data generation techniques, e.g., captioning.
Building evals and reward models for image generation.
Studying training recipes for advancing image / multi-image understanding/generation and agent training.