Research Engineer

Gamma•San Francisco, CA

19h•$180,000 - $340,000•Onsite

About The Position

You'll own the quality of AI across everything Gamma creates. As our Research Engineer, you'll design evaluation frameworks that measure AI output quality, systematically improve production prompts, and fine-tune models to ensure millions of users get exceptional results every time they generate content. This role sits at the intersection of research rigor and product impact. You'll diagnose failure patterns in AI-generated presentations, docs, and websites, then craft targeted improvements through iterative experimentation. You'll build the tools and workflows that enable rapid testing, validate changes against quality benchmarks, and ensure our AI gets smarter with every iteration. You'll succeed here if you combine deep technical expertise with a research-oriented mindset, comfort working in ambiguity, and an attention to detail that catches dimensions of quality others might overlook. Our team has a strong in-office culture and works in person 4–5 days per week in San Francisco. We love working together to stay creative and connected, with flexibility to work from home when focus matters most.

Requirements

2+ years working with AI systems, with demonstrated experience shipping production-grade AI products
Deep hands-on experience with prompt engineering, LLM experimentation, and systematic evaluation of AI outputs
Strong experimental mindset with the ability to design tests, analyze model performance, and iterate toward measurable quality improvements in ambiguous problem spaces
Experience with post-training techniques for LLMs including reinforcement learning and supervised fine-tuning
Exceptional attention to detail and genuine quality obsession, with care for output quality across all dimensions including less visible aspects
Bachelor's degree in Computer Science, Machine Learning, or a related field, or equivalent hands-on experience with AI research and experimentation

Responsibilities

Design and maintain evaluation frameworks that measure AI output quality across all Gamma experiences, developing metrics and benchmarks to assess model performance
Systematically improve production prompts through iterative experimentation, diagnosing failure patterns, crafting targeted improvements, and validating against quality benchmarks
Conduct rigorous experiments to understand model behavior, analyze results, and derive insights that inform prompt and model improvements
Build tools and workflows to support rapid experimentation and quality analysis, enabling faster iteration on AI improvements
Fine-tune models on targeted datasets to improve baseline performance, preventing issues like poor layout choices or low-quality outlines
Partner with product and engineering teams to ensure AI quality improvements ship quickly and work reliably at scale

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume