Amazon.com-posted 3 months ago
$129,300 - $223,600/Yr
Senior
Seattle, WA
5,001-10,000 employees
General Merchandise Retailers

AWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud-scale machine learning accelerators. This role is for a senior software engineer in the Machine Learning Inference Applications team. This role is responsible for development and performance optimization of core building blocks of LLM Inference - Attention, MLP, Quantization, Speculative Decoding, Mixture of Experts, etc. The team works side by side with chip architects, compiler engineers and runtime engineers to deliver performance and accuracy on Neuron devices across a range of models such as Llama 3.3 70B, 3.1 405B, DBRX, Mixtral, and so on.

  • Adapting latest research in LLM optimization to Neuron chips to extract best performance from both open source as well as internally developed models.
  • Working across teams and organizations to ensure effective collaboration.
  • 3+ years of non-internship professional software development experience
  • 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
  • Programming proficiency in Python or C++ (at least one required)
  • Experience with PyTorch
  • Working knowledge of Machine Learning and LLM fundamentals including transformer architecture, training/inference lifecycles, and optimization techniques
  • Strong understanding of system performance, memory management, and parallel computing principles
  • Experience with JAX
  • Experience with debugging, profiling, and implementing software engineering best practices in large-scale systems
  • Expertise with PyTorch, JIT compilation, and AOT tracing
  • Experience with CUDA kernels or equivalent ML/low-level kernels
  • Experience with performant kernel development (e.g., CUTLASS, FlashInfer)
  • Experience with inference serving platforms (vLLM, SGLang, TensorRT) in production environments
  • Deep understanding of computer architecture, operating systems, and parallel computing
  • Medical benefits
  • Financial benefits
  • Equity options
  • Sign-on payments
  • Comprehensive employee benefits package
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service