Machine Learning Engineer, LLM Evals & Observability

GleanMountain View, CA
Hybrid

About The Position

Glean is seeking a Machine Learning Engineer focused on LLM Evals & Observability to join their team. This role is crucial for measuring and improving the quality of Glean's AI Assistant and Agents. The team is responsible for evaluation pipelines, quality eval-sets, LLM-powered judges, agent observability, and the tooling engineers use to understand changes and their impact. It's a unique blend of infrastructure engineering, applied ML, and direct product impact, aimed at making AI quality measurable and driving improvements.

Requirements

  • 2+ years of software engineering experience with strong coding skills.
  • Strong backend fundamentals in Go and Python.
  • Comfortable with distributed data pipelines.
  • Experience working with LLM evaluation, reinforcement learning from human feedback, natural language processing, or other large systems involving machine learning.
  • Analytically rigorous – ability to think carefully about what offline metrics predict about real user experience.
  • Ability to thrive in a customer-focused, tight-knit, and cross-functional environment.
  • Team player willing to take on whatever is most impactful for the company.
  • A strong care for quality in both systems built and the product being measured.

Responsibilities

  • Design and curate evaluation datasets, including sampling strategies, query diversity, and golden sets for reliable coverage of real assistant behavior.
  • Build and maintain large-scale evaluation pipelines to measure assistant quality across thousands of real user queries.
  • Develop LLM-powered judges to score metrics like correctness, completeness, and response quality, aligning them with human judgment.
  • Evaluate new models and product changes before shipping, providing quality signals to gate launches and prevent regressions.
  • Build observability infrastructure for AI agents, including trace enrichment, data pipelines, and dashboards for inspectable assistant behavior.
  • Close the loop between quality measurement and improvement using eval results, customer feedback, and techniques like automated prompt iteration.
  • Collaborate with engineers across the company to integrate evals as a first-class part of the shipping process.

Benefits

  • Competitive compensation
  • Medical, Vision, and Dental coverage
  • Generous time-off policy
  • Opportunity to contribute to your 401k plan
  • Home office improvement stipend
  • Annual education stipend
  • Annual wellness stipend
  • Healthy lunches daily
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service