About The Position

Identify and prioritize agent reasoning and model capability opportunities to improve agentic Excel capabilities. Define the quality bar for agent reasoning, making informed tradeoffs across capability, latency, cost, and reliability. Translate real user workflows and mental models into reasoning requirements and evaluation criteria. Define success metrics and evaluation frameworks to measure reasoning quality, correctness, and robustness. Design data collection and labeling tasks to evaluate models and generate training data for fine-tuning and alignment. Prototype and validate new agent behaviors, reasoning patterns, and feature directions. Develop and iterate on prompts and policies to guide consistent, high-quality model behavior across scenarios. Deploy, monitor, and analyze A/B experiments for model and reasoning changes in production. Incorporate user feedback and telemetry to continuously improve model behavior and reliability. Collaborate cross-functionally with research, applied ML, data, infrastructure, and product teams.

Requirements

  • Bachelor's Degree AND 5+ years experience in product/service/program management or software development OR equivalent experience. These requirements include but are not limited to the following specialized security screenings:
  • Bachelor's Degree AND 8+ years experience in product/service/program management or software development OR equivalent experience.
  • 2+ years experience taking a product, feature, or experience to market (e.g., design, addressing product market fit, and launch, internal tool/framework).
  • 4+ years experience improving product metrics for a product, feature, or experience in a market (e.g., growing customer base, expanding customer usage, avoiding customer churn).
  • 4+ years experience disrupting a market for a product, feature, or experience (e.g., competitive disruption, taking the place of an established competing product).
  • 3+ years of experience leading ambiguous product areas, defining requirements, setting direction, and partnering with cross-functional teams to deliver outcomes.
  • 2+ years of experience building or shipping ML-powered or LLM-powered products.
  • Hands-on experience working with LLM APIs (e.g., OpenAI, Anthropic, Azure OpenAI), embeddings, vector databases, and tool-based systems.
  • Hands-on experience with prompt design, context management, and model evaluation techniques.
  • Demonstrated analytical and problem‑solving ability, with the capacity to reason about complex systems and diverse user workflows.
  • Excellent communication and collaboration skills, including the ability to work effectively with engineering, research, and data partners.
  • Experience translating user intent and behavior into product strategy, and partnering closely with engineering and ML teams to deliver scalable, personalized experiences.
  • 1+ years of hands-on experience building LLM-powered applications, including familiarity with agent and orchestration frameworks, tool use, model evaluation, and efficiency optimization.
  • Demonstrated technical depth in software development, data science, or machine learning, with the ability to reason over complex systems and large datasets.
  • Comfort defining and executing zero-to-one prototypes to validate ideas and inform product direction, even when not writing production code.

Responsibilities

  • Identify and prioritize agent reasoning and model capability opportunities to improve agentic Excel capabilities.
  • Define the quality bar for agent reasoning, making informed tradeoffs across capability, latency, cost, and reliability.
  • Translate real user workflows and mental models into reasoning requirements and evaluation criteria.
  • Define success metrics and evaluation frameworks to measure reasoning quality, correctness, and robustness.
  • Design data collection and labeling tasks to evaluate models and generate training data for fine-tuning and alignment.
  • Prototype and validate new agent behaviors, reasoning patterns, and feature directions.
  • Develop and iterate on prompts and policies to guide consistent, high-quality model behavior across scenarios.
  • Deploy, monitor, and analyze A/B experiments for model and reasoning changes in production.
  • Incorporate user feedback and telemetry to continuously improve model behavior and reliability.
  • Collaborate cross-functionally with research, applied ML, data, infrastructure, and product teams.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service