Senior Solutions Engineer, AI/HPC Networking

DRIVENETSPalo Alto, CA
52dRemote

About The Position

DriveNets is a leader in disaggregated high-scale networking solutions for service providers and AI infrastructures. Founded in December 2015, DriveNets created a radical new way to build networks by adapting the architectural model of the cloud to telco-grade networking. This solution accelerates network deployment, improves the network’s economic model, and radically simplifies network operations. With customers including Comcast, Orange, and KDDI - over 80% of AT&T’s network traffic now runs through a disaggregated core powered by DriveNets software. DriveNets Network Cloud-AI solution, based on the same technology, was introduced to the market in 2023, providing the highest-performance Ethernet-based AI networking solution, and is already deployed by Hyperscalers, NeoClouds and Enterprises. Raising over $587 million in three funding rounds, DriveNets continues to deploy the most innovative network infrastructure and is looking for the most talented people to be part of this journey.

Requirements

  • BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Physics, or other Engineering fields, or equivalent experience.
  • 3+ years of network engineering (system/solution) experience.
  • 3+ years of solution architecture/sales engineering experience, or equivalent, working for a vendor, value-added reseller, or system integrator.
  • Technical expertise in Data Center or high-end enterprise network design (e.g. BGP, EVPN, VXLAN, QoS, Multicast)
  • · Expertise with datacenter design, including networking, compute, and storage.
  • · Ability to write extensive technical content (white papers, technical briefs, etc.) for external audiences with a balance of technical accuracy, strategy, and clear messaging
  • Ability to multitask efficiently in a multifaceted environment, ability to work with teams across geographical locations.
  • Clear written and oral communication skills with the ability to effectively collaborate with executives and engineering teams.
  • Ability to travel domestic and international up to 20% of the time.
  • Be Kind!

Nice To Haves

  • Familiarity with AI-relevant data center infrastructure and networking technologies such as: Infiniband, RoCEv2, lossless Ethernet technologies (PFC, ECN, etc), accelerated computing, GPU, NIC, DPU, etc.
  • Understanding of AI/HPC networking infrastructure solutions, their advantages and disadvantages (AI/HPC networking design, high-speed interconnect technologies)
  • Scale-up – NVLink, UALink, etc
  • Scale-out – Ethernet and Enhanced Ethernet (Scheduled Ethernet, dynamic load balancing and adaptive routing, Spectrum-X, UEC, etc), InfiniBand
  • Backend storage connectivity
  • Understanding of data center operations fundamentals in networking, cooling, and power
  • Familiarity with monitoring tools (e.g., Prometheus, Grafana, ELK Stack) and Telemetry (gRPC, gNMI, OTLP, etc).
  • Proven experience with one or more Tier-1 Clouds (AWS, Azure, GCP, or OCI) or emerging Neoclouds, as well as cloud-native architectures and software.

Responsibilities

  • Building robust AI/HPC infrastructure for new and existing customers.
  • Technical hands-on role in building and supporting NVIDIA/AMD based platforms.
  • Support operational and reliability aspects of large-scale AI clusters, focusing on performance at scale, training stability, real-time monitoring, logging, and alerting.
  • Administer Linux systems, ranging from powerful GPU enabled servers to general-purpose compute systems.
  • Design and plan rack layouts and network topologies to support customer requirements.
  • Design and evaluate automation scripts for network operations, configuring server and switch fabrics.
  • Perform Data Center upgrades and ensure smooth deployment of Drivenets solutions.
  • Install and configure Drivenets products, ensuring optimal performance and customer satisfaction.
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
  • Engage in and improve the whole lifecycle of services from inception and design through deployment, operation, and refinement.
  • Provide feedback to internal teams such as opening bugs, documenting workarounds, and suggesting improvements.
  • Engage with sales teams and customers to ensure success with major opportunities and deployments
  • Introduce new products to the Drivenets’ sales and support teams and to Drivenets’ customers
  • Deliver technical trainings and TOIs for support/sales engineers, partners, and customers
  • Collaborate on product definition through customer requirement gathering and roadmap planning
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service