About The Position

Meta is seeking hands-on engineering managers to join the Meta SuperIntelligence Lab (MSL), making direct contributions to the next generation of Generative AI models. The MSL Infra Optimizations team’s mission is to enable the development and productionization of cutting-edge high-performance optimizers, kernels and numeric algorithms, providing comprehensive analyses of their effect for MSL. Join us and be a part of the team that is shaping the future of Meta SuperIntelligence Lab’s infrastructure!

Requirements

  • MS or BS in Computer Science or Electrical/Electronics Engineering or equivalent
  • 3+ years of experience of directly managing or leading a team of engineers with varied skill levels
  • Experience in leading teams working on high performance computing (HPC) and AI/ML systems, including: GPU/ASIC-based kernel development and optimization (e.g. CUDA)
  • Distributed systems for large scale training and serving
  • Systems Architecture + Performance
  • Large scale distributed systems
  • Experience running a large-scale program and dealing with ambiguity
  • Familiarity with the latest techniques in optimizing GenAI workloads
  • Using frameworks like PyTorch, TorchTriton to develop custom kernels
  • Understands Kernel enablement and optimizations, including experience working on attention kernels
  • Understanding GPU memory hierarchy and computation capabilities
  • Understands low-level CUDA kernel optimizations for inference and training
  • Experience with Quantization and structure sparsity for low precision training & inference
  • Understands Optimizers such as Adam, Shampoo, Muon

Responsibilities

  • Lead and support the team that develops various kernels including but not limited to GEMMs, Attention mechanisms etc. Also, contribute to enabling performance at scale of our inference and training of next generation GenAI (Llama) models
  • Enable the growth of individual contributors, driving the technical roadmap along with technical leads and expand the impact of the team by growing new skill-sets and capabilities
  • Lead a high performance team of engineers to deliver new capabilities and efficient compute systems for our fleet
  • Technical management
  • Experience in systems architecture, performance, workload-analysis and large scale distributed systems
  • Work cross-functionally across hardware and software/services team to drive engineering efforts
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service