Blueprint Technologies-posted about 7 hours ago
$120,000 - $135,000/Yr
Full-time • Mid Level
Onsite • Redmond, WA
501-1,000 employees

In this role, you will contribute to building and operating AI-powered middle-tier services that support conversational experiences within widely used productivity applications. You will focus on prompt evaluation, testing, and automation, ensuring that AI responses are accurate, reliable, and aligned with business and user expectations. You will work closely with engineering, product, and data partners to evaluate LLM behavior, design test strategies, implement supporting code, and continuously improve prompt quality and system performance.

  • Design, evaluate, and refine conversational prompts used in AI-driven applications
  • Perform manual and automated testing of LLM outputs to validate accuracy, relevance, and consistency
  • Develop and maintain prompt evaluation frameworks and supporting tooling
  • Set up and manage test environments for AI and prompt validation workflows
  • Write and maintain code (Python or C#) to support evaluations, automation, and analysis
  • Analyze evaluation results and provide data-driven recommendations for prompt improvements
  • Review enhancement requests and translate requirements into technical solutions
  • Prepare detailed software specifications, test plans, and test data
  • Modify and enhance existing systems to meet new standards or requirements
  • Conduct unit testing, quality assurance reviews, and post-implementation validation
  • Support deployment, migration, and implementation activities
  • Troubleshoot issues in both new and legacy systems and resolve defects identified during testing
  • Bachelor’s degree in Computer Science, Computer Engineering, or a related technical field
  • 2–4 years of professional software engineering or AI-related experience
  • Strong foundation in computer science fundamentals, including data structures, algorithms, and software design
  • Experience developing or supporting large-scale software systems
  • Hands-on programming experience in Python or C#
  • Experience with unit testing, debugging, and troubleshooting in production or pre-production systems
  • Ability to analyze requirements and translate them into effective technical solutions
  • Strong problem-solving skills and attention to detail
  • Prior experience with LLM evaluation, prompt engineering, or AI experimentation
  • Background in data science, experimentation, or model evaluation
  • Experience building automated test frameworks or evaluation pipelines
  • Familiarity with conversational AI, chatbots, or virtual assistant systems
  • Experience working with AI/ML-powered applications in production environments
  • Strong analytical mindset with the ability to interpret evaluation results and metrics
  • Medical, dental, and vision coverage
  • Flexible Spending Account
  • 401k program
  • Competitive PTO offerings
  • Parental Leave
  • Opportunities for professional growth and development
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service