Sr. Software Engineer- AI/ML, AWS Neuron Apps

Amazon.com, Inc.Seattle, WA
54d

About The Position

Join the elite team behind AWS Neuron-the software stack powering AWS's next-generation AI accelerators Inferentia and Trainium. As a Senior Software Engineer in our Machine Learning Applications team, you'll be at the forefront of deploying and optimizing some of the world's most sophisticated AI models at unprecedented scale. You will drive the Evolution of Distributed AI at AWS Neuron As a Technical Leader at the forefront of AWS's AI Accelerator, you'll architect the bridge between ML frameworks including PyTorch, JAX and AI hardware. This isn't just about just optimization-it's about revolutionizing how AI models run at scale. At AWS Neuron, we're revolutionizing how the world's most sophisticated AI models run at scale through Amazon's next-generation AI accelerators. Operating at the unique intersection of ML frameworks and custom silicon, our team drives innovation from silicon architecture to production software deployment. We pioneer distributed inference solutions for PyTorch and JAX using XLA, optimize industry-leading LLMs like GPT and Llama, and collaborate directly with silicon architects to influence the future of AI hardware. Our systems handle millions of inference calls daily, while our optimizations directly impact thousands of AWS customers running critical AI workloads. We're focused on pushing the boundaries of large language model optimization, distributed inference architecture, and hardware-specific performance tuning. Our deep technical experts transform complex ML challenges into elegant, scalable solutions that define how AI workloads run in production.

Requirements

  • 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
  • 5+ years of programming experience using Python or C++ and PyTorch.
  • Experience with AI acceleration via quantization, parallelism, model compression, batching, KV caching, vllm serving
  • Experience with accuracy debugging & tooling, performance benchmarking of AI accelerators
  • Fundamentals of Machine learning and deep learning models, their architecture, training and inference lifecycles along with work experience on optimizations for improving the model execution.
  • Deep expertise in Python and ML framework internals
  • Strong understanding of distributed systems and ML optimization
  • Passion for performance tuning and system architecture

Nice To Haves

  • Master's degree in computer science or equivalent
  • Master's degree in machine learning or equivalent
  • Experience with accuracy debugging & tooling, performance benchmarking of AI accelerators
  • Experience in developing CUDA kernels, HPC and inference optimization, tensors operations

Responsibilities

  • Pioneer distributed inference solutions for industry-leading LLMs such as GPT, Llama, Qwen
  • Optimize breakthrough language and vision generative AI models
  • Collaborate directly with silicon architects and compiler teams to push the boundaries of AI acceleration
  • Drive performance benchmarking and tuning that directly impacts millions of inference calls globally
  • Spearhead distributed inference architecture for PyTorch and JAX using XLA
  • Engineer breakthrough performance optimizations for AWS Trainium and Inferentia
  • Develop ML tools to enhance LLM accuracy and efficiency
  • Transform complex tensor operations into highly optimized hardware implementations
  • Pioneer benchmarking methodologies that shape next-gen AI accelerator design

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Industry

General Merchandise Retailers

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service