AI Engineer – LLM Evaluation & Prompt Engineering

Blueprint Technologies•Redmond, WA

51d•$120,000 - $135,000•Onsite

About The Position

In this role, you will contribute to building and operating AI-powered middle-tier services that support conversational experiences within widely used productivity applications. You will focus on prompt evaluation, testing, and automation, ensuring that AI responses are accurate, reliable, and aligned with business and user expectations. You will work closely with engineering, product, and data partners to evaluate LLM behavior, design test strategies, implement supporting code, and continuously improve prompt quality and system performance.

Requirements

Bachelor’s degree in Computer Science, Computer Engineering, or a related technical field
2–4 years of professional software engineering or AI-related experience
Strong foundation in computer science fundamentals, including data structures, algorithms, and software design
Experience developing or supporting large-scale software systems
Hands-on programming experience in Python or C#
Experience with unit testing, debugging, and troubleshooting in production or pre-production systems
Ability to analyze requirements and translate them into effective technical solutions
Strong problem-solving skills and attention to detail

Nice To Haves

Prior experience with LLM evaluation, prompt engineering, or AI experimentation
Background in data science, experimentation, or model evaluation
Experience building automated test frameworks or evaluation pipelines
Familiarity with conversational AI, chatbots, or virtual assistant systems
Experience working with AI/ML-powered applications in production environments
Strong analytical mindset with the ability to interpret evaluation results and metrics

Responsibilities

Design, evaluate, and refine conversational prompts used in AI-driven applications
Perform manual and automated testing of LLM outputs to validate accuracy, relevance, and consistency
Develop and maintain prompt evaluation frameworks and supporting tooling
Set up and manage test environments for AI and prompt validation workflows
Write and maintain code (Python or C#) to support evaluations, automation, and analysis
Analyze evaluation results and provide data-driven recommendations for prompt improvements
Review enhancement requests and translate requirements into technical solutions
Prepare detailed software specifications, test plans, and test data
Modify and enhance existing systems to meet new standards or requirements
Conduct unit testing, quality assurance reviews, and post-implementation validation
Support deployment, migration, and implementation activities
Troubleshoot issues in both new and legacy systems and resolve defects identified during testing