About The Position

Build and evolve the Real-Time Intelligence evaluations platform: implement offline and online eval pipelines, including golden datasets, human review workflows, and LLM-as-judge / auto-raters for agents, anomaly detectors, and decisioning systems. Instrument agentic solutions for observability by wiring up telemetry, tracing, structured logging, and dashboards so quality, safety, latency, and cost are easy to monitor and debug. Integrate evals into the development lifecycle by connecting pipelines to CI/CD, canary and A/B experiments, and phased rollouts, making it simple for partner teams to run and interpret evaluations. Collaborate and mentor across product, research, and engineering teams, sharing best practices on eval design, LLM-as-judge usage, and Responsible AI, and providing code reviews and guidance that raise the bar for the AI features.

Requirements

  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • 2+ years of experience on engineering tooling or eval development.
  • 1+ years experience in driving fundamentals for AI features within web apps.
  • These requirements include but are not limited to the following specialized security screenings: Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Prior experience in working on services at scale.
  • Understanding of building engineering tools on the server side for scale.
  • Prior experience in working closely with AI feature teams and improving fundamentals like performance and reliability is a major plus.
  • Experience solving challenging problems and cross team/organization collaboration skills.

Nice To Haves

  • Proficiency with React is a plus.
  • Curiosity to dive deep, continuously learn and experiment.
  • Passion for collaboration and knowledge sharing.

Responsibilities

  • Implement offline and online eval pipelines, including golden datasets, human review workflows, and LLM-as-judge / auto-raters for agents, anomaly detectors, and decisioning systems.
  • Instrument agentic solutions for observability by wiring up telemetry, tracing, structured logging, and dashboards so quality, safety, latency, and cost are easy to monitor and debug.
  • Integrate evals into the development lifecycle by connecting pipelines to CI/CD, canary and A/B experiments, and phased rollouts, making it simple for partner teams to run and interpret evaluations.
  • Collaborate and mentor across product, research, and engineering teams, sharing best practices on eval design, LLM-as-judge usage, and Responsible AI, and providing code reviews and guidance that raise the bar for the AI features.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service