Meta-posted 27 days ago
Full-time • Entry Level
Menlo Park, CA

At Meta, we're building and operating one of the world's most dynamic and fast-paced networks, powering our global data centers and supporting cutting-edge technologies like AI, Generative AI, Recommendation engines, and Metaverse. Our network infrastructure teams are responsible for developing, deploying, and operating this complex system, covering the entire network lifecycle from hardware development to operation. We're seeking software engineers to join our teams and help build scalable distributed systems, develop innovative solutions to our challenges, and ship them into production. As part of our network engineering teams, you'll have the opportunity to work on cutting-edge switching technology, collaborate with talented engineers, and contribute to the development of Meta's hyper-scale network infrastructure.

  • Design, develop, and validate drivers, firmware, and software for network devices, transport stacks, and AI workloads
  • Debug complex system-level issues and lead performance tuning exercises to optimize software stack performance
  • Understand software components from multiple partner teams, lead integration efforts, and drive continued development
  • Develop and automate test suites for CI/CD framework and various components
  • Collaborate with partner teams to integrate software components, align on goals, and participate in oncall rotations
  • Design, develop, and deploy services to manage datacenter network switches and forwarding functions
  • Enhance HPC collective communication and parallel computing libraries (NCCL, RCCL, OneCCL, MPI)
  • Currently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta
  • 2+ years software development experience in industry settings or PhD degree +9 months of experience
  • Proficiency in C/C++ and at least one scripting language (Python/Shell Scripting)
  • Experience with network devices and products (routers, switches, adapters, load balancers) and an understanding of network routing protocols
  • Experience with developing and automating test suites
  • Systems programming, TCP/IP, HTTP/HTTPS, SPDY, DNS, and load balancers
  • Linux Kernel, especially drivers and network stack
  • Working knowledge of transport stack particularly Remote Direct Memory Access (RDMA) and/or RDMA over Converged Ethernet version 2 (RoCEv2)
  • Qemu, FPGA Emulation environment is a plus
  • Parallel computing platforms such as CUDA, RoCM and OpenCL
  • Experience with one of Platform services (program, control, and monitor Optics, Physical Layer (PHY), FPGAs, sensors, fan control, power etc), Board Support Package (BSP), Operating Systems, Kernel, Bootloader, Power Management, Real-Time Operating System (RTOS), Linux
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service