Software Development Engineer (Level 5) - GENAI/ML, Amazon Selection and Catalog Systems (ASCS)

Amazon•Seattle, WA

30d

About The Position

Join the Veritas team within Amazon's Selection and Catalog Systems (ASCS) organization as a Software Development Engineer focused on GENAI/ML initiatives. The Veritas team owns Amazon's premier LLM benchmarking and evaluation platform, which is critical for measuring and improving AI performance across the world's largest e-commerce product catalog. In this role, you'll work directly with Large Language Models (LLMs) and build agents to enhance catalog data quality and customer experience at large scale. You'll have extensive opportunities to work with in-house LLM hosting and inference systems, Amazon Bedrock, prompt translation, prompt tuning techniques and agentic solutions. As part of the team that evaluates AI performance across billions of products and attributes, you'll help build and leverage AI agents at scale to assess LLM models, their applications, and the customer experiences they power. The Veritas team provides a unique opportunity to combine advanced generative AI development with large-scale distributed systems engineering, while working on benchmarking and evaluation frameworks that teams across Amazon depend on for their AI development and deployment decisions. The Veritas team is a specialized, innovation-focused group within Amazon Selection and Catalog Systems (ASCS) - Amazon's Catalog System Services (CSS) organization. We own Amazon's premier LLM evaluation and benchmarking platform (Veritas), used by teams across ASCS and the broader company to measure and improve model performance for catalog applications. We evaluate AI across billions of products and attributes, collaborating with science teams on catalog-specific AI research, prompt translation/optimization, and model evaluation. You'll work with the latest generative AI technology, including custom model hosting infrastructure and advanced prompt engineering tools. Growth opportunities include leading industry-defining benchmarking standards for e-commerce AI and taking on leadership roles across Amazon's catalog ecosystem. We foster a collaborative environment where innovation thrives, technical excellence is celebrated, and every team member shapes the future of AI-powered catalog systems while maintaining work-life balance and continuous learning.

Requirements

3+ years of non-internship professional software development experience
2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
Experience programming with at least one software programming language

Nice To Haves

3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
Bachelor's degree in computer science or equivalent

Responsibilities

Develop systems and agents powered by LLMs and multi-modal LLMs to enhance benchmarking and evaluation across Amazon's catalog ecosystem.
Build GenAI-driven solutions that improve evaluation quality and automation for both Bedrock models and open source LLMs for Starfish and other Store Agent systems.
Design AI-driven workflows for various data to enhance LLM benchmarking and performance measurement.
Work extensively with Amazon Bedrock, in-house LLM hosting, and inference systems to build scalable evaluation pipelines for models and applications.
Partner with scientists and AI experts to integrate advanced developments in Generative AI, LLM evaluation, and prompt translation/optimization.
Create comprehensive datasets, evaluation methodologies and standardized metrics to generalized benchmarking use cases and accelerate foundation model switch decision for various applications.