Applied AI Researcher, Benchmarking

Distyl AISan Francisco, CA
80d

About The Position

Distyl AI develops AI native technologies for humans & AI to collaborate to power the operations of the Global Fortune 1000. In just 24 months, we’ve rapidly grown to partner with some of the world’s largest enterprises—including F100 telecom, healthcare, manufacturing, insurance, and retail companies—delivering multiple AI deployments with $100M+ impact. Our platform, Distillery, along with our team of AI Engineers, Researchers, and Strategists, is pioneering AI-native systems of work, solving the most complex, high-stakes challenges at scale. Distyl is founded and led by proven leaders from companies like Palantir, Apple, and top national laboratories. We work in deep partnership with OpenAI, jointly going-to-market at the largest enterprises and collaborating evaluating and testing the latest models. Backed by Lightspeed, Khosla, Coatue, industry leaders like Nat Friedman (former GitHub CEO), as well as board members of over 20+ F500s, Distyl is building the future of AI-powered enterprise operations.

Requirements

  • Experience designing and running evaluations, including building or maintaining benchmarks, test suites, or experimental frameworks.
  • Statistical and analytical rigor to design fair, reproducible experiments and extract signal from noisy empirical results.
  • Experience building with models, not just building models, with expertise in compound AI systems and associated techniques.
  • Proven track record of research results, including publications in top journals or notable work shared publicly.
  • Daily use of AI tools like ChatGPT, Cursor, and Perplexity to enhance workflow.
  • Strong programming and data analysis skills to build prototypes and perform experiments.
  • Bias towards showing results rather than discussing theoretical ideas.

Responsibilities

  • Define how progress is measured through the Benchmarking team.
  • Design evaluation frameworks that capture reasoning depth, interaction quality, reliability, and operational impact.
  • Construct benchmarks that reflect real-world complexity.
  • Explore new paradigms for evaluating intelligent systems, including adversarial robustness testing and longitudinal performance tracking.
  • Investigate how metrics shape model behavior and establish rigorous methodologies for quantifying emergent capability.
  • Drive Distyl’s internal research priorities and industry-wide standards.

Benefits

  • Competitive salary and benefits package, including equity options.
  • Medical, dental, and vision coverage at 100% for you and your dependents.
  • 401K plan.
  • Commuter benefits.
  • Lunch provided in office.
  • Collaborative and intellectually stimulating environment.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service