EFA Network Software Engineer, EFA Software Team

AmazonSeattle, WA
87d$129,300 - $223,600

About The Position

Want to help make the next generation of Machine Learning in the cloud possible? Do you have a laser focus on performance in your code? We want to talk to you! We own the user-space software that makes the Elastic Fabric Adapter (EFA) network card work for Machine Learning (ML) and High-Performance Computing (HPC) customers on AWS. Across multiple projects written in C, our team enables customers to network thousands of GPU and CPU instance types to handle the toughest clustered workloads. Be a part of a dynamic, fast-paced group that has a big impact every day on the hottest companies doing AI and HPC today.

Requirements

  • 3+ years of non-internship professional software development experience.
  • 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience.
  • 3+ years of professional experience programming high-performance software in C, ideally as part of an Open Source project.

Nice To Haves

  • 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience.
  • Bachelor's degree in computer science or equivalent.
  • Experience developing in a network software stack, with a focus on cutting occupancy to the barest minimum number of instructions.

Responsibilities

  • Write the highest-performing code in C for multiple open source projects supporting EFA, such as Libfabric and Open MPI.
  • Work with multiple teams in the stack to invent new APIs for the latest concepts in networking in the cloud.
  • Dive deep into how customers are doing collectives and messaging at high bandwidth and low latency.
  • Provide expert-level support to some of the biggest names in AI in the world.
  • Invent new ways of cutting the occupancy of the software stack for EFA based on customer needs.
  • Write comprehensive tests to drive the development of new features and guard against regressions.
  • Work with the ML Infrastructure team to see products perform on 100s and 1000s of top-end machine clusters.

Benefits

  • Equity and sign-on payments may be provided as part of a total compensation package.
  • Full range of medical, financial, and/or other benefits.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service