The omni team at xAI creates magical AI experiences beyond text, enabling understanding and generation of content across various modalities, including image, video, and audio. We are building next-generation multimodal agents that reason, act, and communicate through images as naturally as text. In many real-world scenarios, images are a more intuitive and efficient way to convey information than language alone. This role focuses on advancing image generation models with agentic behavior and reinforcement learning, enabling higher factuality, controllability, and seamless integration into multimodal conversational systems. You will work on core research and production systems that allow models to decide when and how to generate images, improving Grok multimodal responses where images meaningfully augment or replace text.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed
Number of Employees
1,001-5,000 employees