QA Automation

TEKsystemsSan Francisco, CA
$35 - $50Remote

About The Position

We are seeking 3 Evaluation Analysts to assess the performance of AI models tasked with implementing web features. Your work directly informs whether AI-generated code is correct, whether the instructions given to the models are clear, and whether the testing frameworks used to evaluate them are fair and reliable. You will analyze the quality of the entire evaluation pipeline—from the instructions given to the AI through to the final score—across the following categories: - Model Capability – Assess how well the AI performed each task by reviewing generated transcripts and results. - Bug Discovery Value – Identify patterns in AI code failures to understand why the model is making specific mistakes. - Score Health – Ensure tests are properly configured to apply correct scores during evaluation. - Task Specification Quality – Verify that prompts given to the AI are clear, correct, and technically precise (e.g., consistent variable names). - Test-by-Test Analysis – Evaluate the quality of automated tests using metrics like precision and recall to ensure they accurately measure AI performance. - Platform Issues – Report bugs or problems within the evaluation system itself, particularly with automated browser testing services.

Requirements

  • MUST HAVE 3-4 YEARS OF PLAYWRIGHT EXPERIENCE WITH A GREAT UNDERSTANDING WITH LLMS
  • Expertise in finding patterns and issues in generative AI/LLM outputs
  • Direct experience with labeling and scoring frameworks
  • Experience writing Playwright tests
  • Strong analytical and problem-solving skills
  • Ability to interpret model behavior and articulate failure modes clearly

Nice To Haves

  • Advanced analytical credentials — highly relevant for interpreting model behavior
  • Familiarity with web development concepts (HTML, CSS, JavaScript)
  • Experience with automated testing and grading systems
  • Background in quality assurance or evaluation methodology

Responsibilities

  • Assess how well the AI performed each task by reviewing generated transcripts and results.
  • Identify patterns in AI code failures to understand why the model is making specific mistakes.
  • Ensure tests are properly configured to apply correct scores during evaluation.
  • Verify that prompts given to the AI are clear, correct, and technically precise (e.g., consistent variable names).
  • Evaluate the quality of automated tests using metrics like precision and recall to ensure they accurately measure AI performance.
  • Report bugs or problems within the evaluation system itself, particularly with automated browser testing services.

Benefits

  • Medical, dental & vision
  • Critical Illness, Accident, and Hospital
  • 401(k) Retirement Plan – Pre-tax and Roth post-tax contributions available
  • Life Insurance (Voluntary Life & AD&D for the employee and dependents)
  • Short and long-term disability
  • Health Spending Account (HSA)
  • Transportation benefits
  • Employee Assistance Program
  • Time Off/Leave (PTO, Vacation or Sick Leave)
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service