Prompting & AI Agent Research Engineer

Hello Patient•Austin, TX

2d•$180,000 - $230,000

About The Position

Hello Patient is seeking a Prompting & AI Agent Research Engineer to bring structure and rigor to how we advance our AI. We build Mia, an AI voice agent that handles inbound patient calls for multi-location healthcare providers - and as we grow, we need someone who can help us make more confident, evidence-backed decisions about how Mia evolves. You'll design experiments we can trust, stay close to what's emerging in the research community, and turn promising ideas into something testable. This role sits at the intersection of prompt engineering and applied AI research. You should be comfortable enough technically to prototype and run local experiments independently. Above all, we're looking for someone with deep, hands-on prompting experience - someone who has deliberately engineered prompts to change how a system behaves.

Requirements

5+ years of experience in a prompting, AI research, or applied AI role
Advanced degree in a research-oriented field (PhD preferred) - CS, linguistics, cognitive science, stats, or similar
Real prompt engineering experience - deliberately designing, testing, and improving prompts to change system behavior
Solid experimental design fundamentals: controls, statistical significance, knowing when a result actually means something
Hands-on experience working with LLMs in applied contexts
Comfort with RAG, agentic architectures, and modern LLM tooling
Ability to evaluate and validate AI system behavior - understanding what the model is doing and why
Clear written communication; your findings need to land with engineers and non-technical stakeholders alike

Nice To Haves

Experience with voice AI or conversational systems
Research background with a focus on prompting techniques, prompt optimization, or evaluation of LLM behavior
Time at an AI lab, applied AI team, or early-stage startup
Familiarity with LLM evaluation frameworks and behavioral test suites
Published or applied work in prompt design, chain-of-thought, or related areas

Responsibilities

Design structured experiments with real controls and statistical rigor
Build regression test coverage so we know when a global prompt change is breaking things across workflows we didn't expect
Create a pipeline of validated, high-confidence improvements engineering can pull from instead of building on faith
Work with the team to prioritize what actually gets shipped based on what the experiments say
Own prompt design across Mia's agent workflows - multi-turn, voice-first, healthcare context
Stay current on what's coming out in prompting research, and agent architecture, and bring back ideas worth testing
Prototype new approaches locally before anything touches production
Be the person the Product team comes to when they're stuck on a prompting problem
Work closely with our Product and Engineering teams - understand where Mia is struggling and help decide what's worth experimenting on
Translate experiment results into clear recommendations people can actually act on
Build out evaluation templates and frameworks so this doesn't live only in your head