Amazon.com-posted 3 months ago
$129,300 - $223,600/Yr
Senior
Seattle, WA
5,001-10,000 employees
General Merchandise Retailers

AWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud-scale machine learning accelerators. This role is for a senior software engineer in the Machine Learning Inference Applications team. This role is responsible for development and performance optimization of core building blocks of LLM Inference - Attention, MLP, Quantization, Speculative Decoding, Mixture of Experts, etc. The team works side by side with chip architects, compiler engineers and runtime engineers to deliver performance and accuracy on Neuron devices across a range of models such as Llama 3.3 70B, 3.1 405B, DBRX, Mixtral, and so on.

  • Adapting latest research in LLM optimization to Neuron chips to extract best performance from both open source as well as internally developed models.
  • Working across teams and organizations.
  • 3+ years of non-internship professional software development experience.
  • 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience.
  • Programming proficiency in Python or C++ (at least one required).
  • Experience with PyTorch.
  • Working knowledge of Machine Learning and LLM fundamentals including transformer architecture, training/inference lifecycles, and optimization techniques.
  • Strong understanding of system performance, memory management, and parallel computing principles.
  • Experience with JAX.
  • Experience with debugging, profiling, and implementing software engineering best practices in large-scale systems.
  • Expertise with PyTorch, JIT compilation, and AOT tracing.
  • Experience with CUDA kernels or equivalent ML/low-level kernels.
  • Experience with performant kernel development (e.g., CUTLASS, FlashInfer).
  • Experience with inference serving platforms (vLLM, SGLang, TensorRT) in production environments.
  • Deep understanding of computer architecture, operating systems, and parallel computing.
  • Equity and sign-on payments may be provided as part of a total compensation package.
  • Full range of medical, financial, and/or other benefits.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service