About The Position

AWS Trainium is deployed at scale, with millions of chips in production, and has been used for training and inference of frontier models. AWS Neuron is the software stack for Trainium, enabling customers to run deep learning and generative AI workloads with optimal performance and cost efficiency. AWS Neuron is hiring a Principal Technical Product Manager to drive the product strategy for agentic AI systems that accelerate customer adoption of Trainium. You will own the end to end product experience for autonomous and interactive workflows that leverage generative AI to port, optimize, and tune ML models for Neuron, enabling customers to seamlessly migrate from other platforms and achieve optimal performance with minimal manual intervention. You will define how customers interact with our agentic systems, spanning interactive IDE based experiences to fully autonomous production pipelines. You will drive the strategy for knowledge systems that help agents learn and improve, and integration with the Neuron SDK and developer tools. To be successful in this role, you will partner with engineering teams building agentic infrastructure, applied scientists developing in-context learning, post training and reinforcement learning approaches, PMs responsible for Neuron compiler, runtime, and NKI, as well as Marketing, Business Development, and Solution Architects supporting customers. You will develop deep knowledge of LLM based automation, agentic system architectures, model transformation workflows, and kernel optimization to effectively define product strategy and make informed technical decisions. The Ideal Candidate The ideal candidate can navigate ambiguity in a fast moving, early stage initiative, balance competing priorities across multiple workstreams, and drive alignment across engineering and science stakeholders with excellent written and verbal communication abilities. About AWS Neuron AWS Neuron is the software stack for running deep learning and generative AI workloads on AWS Trainium and AWS Inferentia. It includes a compiler, runtime, training and inference libraries, and developer tools for monitoring, profiling, and debugging. Built on an open source foundation, Neuron supports native PyTorch and JAX frameworks and popular ML libraries without code modification. Neuron enables rapid experimentation, distributed training across multiple chips and nodes, and cost optimized inference powered by optimized kernels. For performance optimization, Neuron provides the Neuron Kernel Interface (NKI) for direct hardware access and a suite of profiling and debugging tools. About Amazon Annapurna Labs Amazon Annapurna Labs team (our organization within AWS UC) is responsible for building innovation in silicon and software for our AWS customers. We are at the forefront of innovation by combining cloud scale with the world's most talented engineers. Our team covers multiple disciplines including silicon engineering, hardware design, software and operations. Because of our teams breadth of talent, we have been able to improve AWS cloud infrastructure in high performance machine learning with AWS Neuron, Inferentia and Trainium ML chips, in networking and security with products such as AWS Nitro, Enhanced Network Adapter (ENA), and Elastic Fabric Adapter (EFA), and in computing with AWS Graviton and F1 EC2 instances. About AWS Utility Computing (UC) AWS Utility Computing (UC) provides product innovations that continue to set AWS's services and features apart in the industry. As a member of the UC organization, you'll support the development and management of Compute, Database, Storage, Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. Additionally, this role may involve exposure to and experience with Amazon's growing suite of generative AI services and other cloud computing offerings across the AWS portfolio. About AWS Amazon Web Services (AWS) is the world's most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating, that's why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.

Requirements

  • 7+ years of working as a Technical Product Manager experience
  • Bachelor's degree in computer science, engineering, mathematics, or equivalent
  • 10+ years of industry experience with at least 7+ years in Technical product management and 3+ years of software development
  • Experience with technical product management for AI/ML platforms, developer tools, or automation systems
  • Demonstrated experience with LLM based applications, including prompt engineering, agent architectures, or AI assisted tooling
  • Experience defining metrics and success criteria for AI/ML products
  • Excellent written and verbal communication abilities

Nice To Haves

  • Experience with agentic AI systems (multi agent orchestration, tool use, autonomous workflows)
  • Knowledge of ML model architectures (transformers, attention mechanisms, mixture of experts models)
  • Experience with model optimization techniques (quantization, sharding, speculative decoding)
  • Familiarity with ML frameworks (PyTorch, JAX) and model hubs (Hugging Face)
  • Experience with data platforms or knowledge management systems
  • Understanding of reinforcement learning concepts and post training pipelines
  • Experience with developer tools and IDE integrations
  • Exposure to compiler technologies, kernel development, or hardware acceleration
  • Track record of driving open source projects or ecosystem partnerships
  • Experience operating in ear

Responsibilities

  • Drive product strategy and roadmap for agentic workflows that automate model porting and performance optimization. Define the vision for autonomous AI powered systems, balancing automation levels and customer self service capabilities. Produce PRFAQ and PRD documents that articulate how agentic systems accelerate Neuron adoption.
  • Own the product experience for agentic workflows, from interactive transformations to fully autonomous pipelines. Define requirements for reusable building blocks and infrastructure that compose into end to end automation. Drive trade offs between autonomy, accuracy, and human oversight.
  • Engage with customers adopting Neuron throughout the product lifecycle from evaluation through optimization. Partner with Business Development, Sales, and account teams to support technical evaluation and post-adoption performance tuning. Understand model transformation challenges and optimization needs across customer segments from startups to enterprises. Translate customer pain points into product requirements and drive sustained adoption through technical enablement.
  • Own end-to-end launch strategy for agentic workflow capabilities. Coordinate cross-functional launch activities including technical documentation, field enablement materials, and customer communications. Partner with Marketing and Solutions Architecture teams to drive service awareness and adoption. Define launch success criteria and track adoption metrics.
  • Drive strategy for knowledge systems that help agents learn from past successes and failures. Define requirements for how data is collected, stored, and used to improve agent performance over time. Establish metrics to measure automation effectiveness and business impact on customer adoption.
  • Drive technical alignment across agentic infrastructure, autonomous workflows, and optimization systems. Partner with applied scientists on post training pipelines, reinforcement learning frameworks, and model fine tuning approaches. Partner with PMs responsible for Neuron compiler, NKI, runtime, and profiling tools. Write user stories, validate features, and define success metrics.
  • Own strategy to deliver agentic workflow capabilities, including open sourcing and integration into Neuron SDK, developer tools, and other Amazon products and services. Drive academic collaboration and open source community contributions. Coordinate with Business Development and Sales teams on strategic partnerships and ecosystem integrations that accelerate customer adoption.
  • Serve as escalation point for critical product issues impacting customer adoption and performance. Diagnose technical problems with customer engineering teams and coordinate resolution across Neuron components. Own customer communications during incidents and drive follow-up improvements to prevent recurrence.

Benefits

  • health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
  • 401(k) matching
  • paid time off
  • parental leave
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service