Staff Network Engineer

LambdaSan Francisco, CA
87d

About The Position

Lambda, The Superintelligence Cloud, builds Gigawatt-scale AI Factories for Training and Inference. Lambda’s mission is to make compute as ubiquitous as electricity and give every person access to artificial intelligence. One person, one GPU. If you'd like to build the world's best deep learning cloud, join us. This position requires presence in our San Francisco or Seattle office location 4 days per week; Lambda’s designated work from home day is currently Tuesday.

Requirements

  • Have 15+ years of experience in designing and operating production datacenter networks
  • Have led the implementation of large production-scale networking projects
  • Expert in CLOS/Spine and Leaf fabrics, EVPN/VXLAN, ECMP, BGP, and fast convergence techniques
  • Have experience with multi-data center networks, backbone and hybrid cloud networks
  • Production experience with at least two switches/routers vendors (e.g., Arista, Juniper, Cisco, NVIDIA/Mellanox, Cumulus/SONiC)
  • Experience with Next-Generation Firewalls (NGFW) (e.g. Fortigate, Juniper)
  • Experience with Load Balancers like F5, NetScaler
  • Are comfortable on the Linux command line, and have an understanding of the Linux networking stack
  • Strong automation skills (Python, Ansible) and network APIs

Nice To Haves

  • Hands-on with HPC/AI networking: RoCEv2 and/or InfiniBand (Congestion Control, VLs, partitions), GPUDirect RDMA concepts
  • Experience with DWDM technologies and SD-WAN
  • Understanding of data center power/space/cooling trade-offs and their impact on topology choices
  • Experience with Observability tools like Datadog, Splunk, Grafana, Prometheus
  • Experience automating network configuration within public clouds, with tools like Terraform
  • Have led implementation of production-scale SDNs in a cloud context (e.g. helped implement the infrastructure that powers an AWS VPC-like feature)
  • Deep understanding of the Linux networking stack and its interaction with network virtualization
  • Experience with SDN ecosystem (e.g. OVS, Neutron, DPDK, Cisco ACI or Nexus Fabric Controller, Arista CVP)

Responsibilities

  • Help scale Lambda’s high performance cloud network
  • Contribute to the reproducible automation of network configuration
  • Contribute to the design and development of software defined networks
  • Help manage Spine and Leaf networks
  • Ensure high availability of our network through monitoring, failover, and redundancy
  • Ensure VMs clients have predictable networking performance through the use of QoS and other applicable technologies
  • Help with deploying and maintaining network monitoring and management tools

Benefits

  • Health, dental, and vision coverage for you and your dependents
  • Wellness and Commuter stipends for select roles
  • 401k Plan with 2% company match (USA employees)
  • Flexible Paid Time Off Plan that we all actually use
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service