Citi-posted 3 months ago
$142,320 - $213,480/Yr
Full-time
New York, NY
5,001-10,000 employees

We are seeking an innovative and detail-oriented professional to lead the development and management of the Generative AI (GenAI) testing and evaluation framework. This role focuses on creating patterns, methodologies, and iterative structures to optimize the performance and effectiveness of GenAI models, with a particular emphasis on prompt engineering and evaluation. The ideal candidate will have a strong background in GenAI, a deep understanding of natural language processing, and a passion for refining AI solutions through rigorous testing and iteration.

  • Design and implement a comprehensive testing and evaluation framework for GenAI model outputs.
  • Develop standards and patterns for assessing the quality and 'goodness' of prompts across diverse use cases.
  • Create iterative processes for testing and refining prompts to optimize model outputs.
  • Establish criteria for evaluating prompt performance, including accuracy, completeness, relevance, coherence, and alignment with desired outcomes.
  • Experiment with prompt structures to identify optimal configurations for various business applications.
  • Develop and document best practices for prompt design and refinement.
  • Work closely with tech partners, engineers, and product teams to ensure testing frameworks integrate seamlessly into the development lifecycle.
  • Partner with stakeholders to understand business requirements and tailor testing methodologies to address specific needs.
  • Provide actionable insights and recommendations to improve model performance based on evaluation results.
  • Identify and implement tools for automating the testing and evaluation process.
  • Develop dashboards and reporting mechanisms to monitor prompt and model performance metrics.
  • Stay updated on emerging tools and techniques in AI testing and integrate them into the framework.
  • Establish feedback loops to iteratively improve testing methodologies and evaluation standards.
  • Establish process for ongoing monitoring of prompts, once productionalized.
  • Monitor industry trends and advancements in Generative AI to ensure the framework remains cutting-edge.
  • Advocate for a culture of experimentation and continuous learning within the organization.
  • Expertise in Generative AI and natural language processing (NLP) models.
  • Strong proficiency in prompt engineering and familiarity with frameworks for AI evaluation.
  • Hands-on experience with AI tools, libraries, and cloud platforms.
  • Strong problem-solving skills and ability to derive actionable insights from complex data.
  • Attention to detail with a focus on precision and accuracy in evaluation.
  • Deep understanding of AI/ML testing methodologies and best practices.
  • Proficiency in programming languages like Python and experience with relevant libraries (e.g., PyTorch, TensorFlow).
  • Passion for exploring new methodologies to improve AI evaluation frameworks.
  • Creativity in designing experiments and testing approaches.
  • Excellent communication skills to convey technical concepts to diverse audiences.
  • Ability to work collaboratively across cross-functional teams and influence stakeholders.
  • Comfortable working in a fast-paced, dynamic environment.
  • Willingness to learn and adapt to new tools, technologies, and methodologies.
  • Medical, dental & vision coverage
  • 401(k)
  • Life, accident, and disability insurance
  • Wellness programs
  • Paid time off packages, including planned time off (vacation), unplanned time off (sick leave), and paid holidays
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service