About The Position

As part of the AWS Applied AI Solutions organization, the vision is to provide business applications leveraging Amazon’s experience and expertise, used by millions of companies worldwide to manage day-to-day operations. This is achieved by accelerating customer businesses through intuitive and differentiated technology solutions that solve enduring business challenges. The team blends vision with curiosity and Amazon’s real-world experience to build opinionated, turnkey solutions, becoming a trusted partner for customers who prefer to buy over build. Amazon Connect is an AI-powered customer experience solution launched in 2017, transforming how organizations interact with customers. The role involves building and optimizing infrastructure for frontier Large Language Models (LLMs) at massive scale, transforming customer interactions with AI-powered services. Joining a world-class team of ML engineers and scientists within AWS, the individual will develop production ML systems for next-generation cloud computing applications. AWS is the world’s leading cloud platform, and customers present complex, high-impact problems, offering unique opportunities for Machine Learning Engineers to deliver real-world impact. The role operates as a technical leader, owning the design and evolution of large-scale ML infrastructure, partnering with applied scientists, software engineers, and product teams to translate frontier LLM research into highly reliable, efficient, and scalable production systems. This involves working with state-of-the-art GPU and custom accelerator hardware, leveraging AWS’s scale in data and compute for LLM serving and optimization. The expectation is to design and build highly available, cost-efficient LLM serving systems, optimize inference performance across the full stack, and develop innovative ML infrastructure solutions to accelerate scientific iteration and enhance customer AI experiences.

Requirements

  • 3+ years of non-internship professional software development experience
  • 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
  • 1+ years of software development engineer or related occupational experience
  • 1+ years of designing and developing large-scale, multi-tiered, multi-threaded, embedded or distributed software applications, tools, systems, and services using: C#, C++, Java, or Perl experience
  • 1+ years of Object Oriented Design experience
  • Bachelor's degree or foreign equivalent in Computer Science, Engineering, Mathematics, or a related field
  • Experience programming with at least one software programming language

Nice To Haves

  • 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
  • Bachelor's degree in computer science or equivalent

Responsibilities

  • Design, develop, and research machine learning systems end-to-end — building robust ML solutions that translate data science prototypes into production-ready systems that drive real business outcomes.
  • Build, host, and maintain production-grade LLM serving and inference infrastructure — delivering high-quality, highly available, always-on AI systems that customers and internal teams can depend on.
  • Optimize the full inference stack for performance and cost-efficiency — applying techniques such as model quantization, batching strategies, KV-cache management, and accelerator tuning.
  • Partner with cross-functional teams and customers to deeply understand real-world challenges, and iteratively translate requirements into scalable, secure, and cost-effective machine learning solutions on AWS.

Benefits

  • sign-on payments
  • restricted stock units (RSUs)
  • health insurance
  • medical insurance
  • dental insurance
  • vision insurance
  • prescription coverage
  • Basic Life & AD&D insurance
  • Supplemental life plans
  • EAP
  • Mental Health Support
  • Medical Advice Line
  • Flexible Spending Accounts
  • Adoption and Surrogacy Reimbursement coverage
  • 401(k) matching
  • paid time off
  • parental leave
  • Mentorship
  • Career Growth
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service