Network Simulation Engineer

Eridu AISaratoga, CA

About The Position

We are seeking a highly motivated Network Simulation Engineer to lead the simulation and analysis of AI communication workloads (e.g. collective communications) across various data center network topologies. In this role, you will apply network simulation tools to model real-world AI applications—including LLMs and DLRMs—to inform architectural decisions across Eridu’s product development lifecycle and to showcase our value to prospective customers and investors. You will collaborate cross-functionally with customers, ASIC designers, and simulation tool providers to optimize performance, influence design, and deliver transformative AI networking solutions.

Requirements

  • MSc or PhD in Computer Science, Electrical Engineering, or a related field with some specialization in AI/ML communications or equivalent hands-on experience
  • Strong experience with network simulation tools such as NS3, OMNeT++, or custom-built simulators.
  • Familiarity with distributed training frameworks (e.g., PyTorch, TensorFlow), collective communication libraries (e.g., NCCL, RCCL), and GPU programming (CUDA or ROCm).
  • Deep understanding of frontier model architectures, parallelism approaches and operational functionality
  • Deep understanding of Ethernet, InfiniBand, and high-performance data center networking technologies.
  • Solid grasp of AI system architecture, including compute, memory, and interconnect bottlenecks in large-scale training/inference clusters.
  • Strong programming skills in C++ and Python.
  • Clear and confident communication skills, both written and verbal.
  • 2+ years of relevant experience preferred; exceptional early-career candidates will also be considered.

Responsibilities

  • Model AI Workloads: Simulate communication patterns of distributed AI workloads (e.g., LLMs, DLRMs) across diverse network topologies to analyze performance and scalability.
  • Drive Architecture Optimization: Work with customers to evaluate their AI workloads and provide recommendations for topology design, protocol tuning, and system architecture.
  • Influence ASIC Design: Collaborate with the internal ASIC and architecture teams by providing simulation-based insights that shape chip design for optimized AI traffic flows.
  • Tool Development & Partnership: Interface with simulation tool providers (who have optimized versions of NS-3, ASTRA-sim, etc.) to customize, tune, and enhance modeling frameworks for Eridu’s specific requirements and to operate these tools to run simulations.
  • Documentation & Communication: Create clear and compelling reports, documentation, and presentations to communicate insights to technical and non-technical stakeholders.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service