Oracle-posted 3 months ago
Full-time • Manager
Austin, TX
5,001-10,000 employees
Publishing Industries

The AI2NE Org strives to be global leaders in the RDMA cluster networking domain and enable seamless, accelerated High-Performance Compute (HPC), Artificial Intelligence and Machine Learning advancements. We envision a future where artificial intelligence and machine learning revolutionize industries, reshape societies, and unlock limitless possibilities. Our vision is to be a pioneering force, driving the development and design of state-of-the-art RDMA clusters tailored specifically for AI, ML, HPC workloads. We strive to be the go-to experts in RDMA cluster architecture, leveraging our deep understanding of the unique demands of AI/ML and HPC applications. By staying at the forefront of technological advancements, we aim to redefine the boundaries of what is possible, pushing the envelope of computational capabilities and unlocking unprecedented performance. We're looking for a hands-on leader with strong management experience to help us build new features and grow our team. The role will be leading a team of network development engineers in a fast-paced environment that requires agility and the drive to deliver. The team will be responsible for provisioning, securing, scaling & operating the network stack required to run distributed AI workloads across a cluster spanning thousands of GPUs. The candidate should be comfortable with building complex distributed systems involving the management and control of hundreds of thousands of network devices.

  • Lead a team of network development engineers.
  • Build new features and grow the team.
  • Provision, secure, scale, and operate the network stack for distributed AI workloads.
  • Manage and control a large number of network devices.
  • BS or MS in Computer Science, Network/Electrical Engineering, or equivalent experience.
  • Minimum 7+ years of experience in large-scale physical network support.
  • 2-4 years of people management experience.
  • Minimum 2+ years of experience in Network Development/Deployment at scale.
  • Experience building scalable, cloud-native distributed systems.
  • Ability to work in a collaborative, cross-functional team environment.
  • Solid understanding of key networking technologies for the cloud.
  • Ability to effectively communicate technical ideas verbally and in writing.
  • Understand the end-to-end configuration and technical dependencies of production services.
  • Experience with production operations and best practices for deploying code in production.
  • Flexible medical options.
  • Life insurance options.
  • Retirement options.
  • Volunteer programs.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service