Cornelis Networks - Chesterbrook, PA

posted 3 days ago

Full-time - Mid Level
Hybrid - Chesterbrook, PA
Professional, Scientific, and Technical Services

About the position

Cornelis Networks is hiring talented AI/ML Application Performance Engineer to help drive innovation and contribute to the development of cutting-edge technologies in the semiconductor industry. In this role, you will be responsible for providing technical expertise in AI and Machine Learning (ML) that can be applied to a diverse range of AI/ML use cases, working alongside a team of industry experts to shape the future of high-performance networking solutions.

Responsibilities

  • Perform benchmarking and optimization of open source and industry-standard AI/ML applications with current and future HPC hardware
  • Develop, execute, and maintain software required to run AI/ML applications and benchmarks
  • Participate in the development of supporting libraries and middleware
  • Assist sales and marketing teams by delivering proof points and performance benchmarking comparisons between Cornelis Omni-Path and competing interconnects
  • Collect and analyze performance data, identifying performance limitations, and determining the best approach and techniques to improve performance
  • Present research findings both within company and to external stakeholders
  • Collaboration with cross-functional teams across all levels of a corporation to evangelize the capabilities and performance advantages of Cornelis products

Requirements

  • Bachelor's degree (Master's preferred) in computer science, engineering, math, or related technical discipline
  • 3-5 years of experience running HPC and/or AI/ML applications on clusters
  • Ability to set up, run, and analyze AI/ML application benchmarks and demonstrate a proficient understanding in message passing, scaling optimization, and identifying performance bottlenecks
  • Ability to modify AI/ML models and distribute training across networks, outside of a single GPU compute platform
  • Experience with Message Passing Interface (MPI) and compiling software with a variety of compilers (Intel, gcc, etc.) and libraries
  • Extensive Python and shell script experience
  • Experience with HPC network architectures such as Omni-Path, InfiniBand, or Ethernet
  • Experience operating in UNIX or Linux computing environment
  • Excellent written and verbal communication skills

Nice-to-haves

  • Knowledge of HPC resource management and job scheduling systems (e.g., SLURM, PBS)
  • Hands-on experience with analyzing and optimizing networks to improve scale-out performance using a range of profiling tools such as NVIDIA Nsight Systems
  • Experience with AI frameworks like NeMo, PyTorch Lightning, Megatron-LM, and DeepSpeed

Benefits

  • Medical, dental, and vision coverage
  • Disability and life insurance
  • Dependent care flexible spending account
  • Accidental injury insurance
  • Pet insurance
  • Generous paid holidays
  • 401(k) with company match
  • Open Time Off (OTO) for regular full-time exempt employees
  • Sick time, bonding leave, and pregnancy disability leave
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service