About The Position

AWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud-scale machine learning accelerators. This role is for a senior software engineer in the Machine Learning Inference Applications team. This role is responsible for development and performance optimization of core building blocks of LLM Inference - Attention, MLP, Quantization, Speculative Decoding, Mixture of Experts, etc. The team works side by side with chip architects, compiler engineers and runtime engineers to deliver performance and accuracy on Neuron devices across a range of models such as Llama 3.3 70B, 3.1 405B, DBRX, Mixtral, and so on.

Requirements

  • 3+ years of non-internship professional software development experience
  • 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
  • Programming proficiency in Python or C++ (at least one required)
  • Experience with PyTorch
  • Working knowledge of Machine Learning and LLM fundamentals including transformer architecture, training/inference lifecycles, and optimization techniques
  • Strong understanding of system performance, memory management, and parallel computing principles

Nice To Haves

  • Experience with JAX
  • Experience with debugging, profiling, and implementing software engineering best practices in large-scale systems
  • Expertise with PyTorch, JIT compilation, and AOT tracing
  • Experience with CUDA kernels or equivalent ML/low-level kernels
  • Experience with performant kernel development (e.g., CUTLASS, FlashInfer)
  • Experience with inference serving platforms (vLLM, SGLang, TensorRT) in production environments
  • Deep understanding of computer architecture, operating systems, and parallel computing

Responsibilities

  • Adapting latest research in LLM optimization to Neuron chips to extract best performance from both open source as well as internally developed models.
  • Working across teams and organizations.

Benefits

  • Equity and sign-on payments may be provided as part of a total compensation package
  • Full range of medical, financial, and/or other benefits
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service