Broadcom-posted 4 months ago
$81,000 - $130,000/Yr
Full-time • Entry Level
San Jose, CA
Computer and Electronic Product Manufacturing

Seeking a highly focused and motivated engineer in a software team responsible for testing AI/ML Interconnect Solutions. The candidate should have a strong understanding of Ethernet functionality, TCP/IP networking, virtualization technologies, RDMA, PCIe protocol - Gen3 & above. Good understanding of AI/ML clusters, Deep learning models, and GPU Micro benchmarks.

  • Creation and review of Test scenarios, Test cases, and Test Automation
  • Reviews of design and functional specifications created by the development team to understand product functionality.
  • Execute test activities and work closely with multi-site team of developers and testers
  • Review User Documentation to ensure it clearly documents product functionality
  • Prioritize and manage multiple, parallel tasks, projects & releases
  • Bachelor's degree in Engineering with a minimum of two (2) years of hands-on test experience, or a Master's degree in Engineering
  • Strong analytical, problem-solving skills & debugging skills.
  • Possess excellent communication skills and need to be a critical thinker and a self-starter.
  • Possess a strong 'break feature mentality'
  • Possess a strong engineering mindset to develop thorough test cases
  • Strong networking experience with protocol testing & validations.
  • Experience with L2/L3 protocols especially RoCE( RDMA over Converged Ethernet ) protocol & use cases in AI/ML, HPC clusters.
  • Experience on AMD/NVIDIA GPUs, Communication Collectives - RCCL/NCCL & libraries - RoCM/CUDA.
  • Experience in utilizing automation scripts in Python - primarily network and system-level programming using Python.
  • Having experience with network test equipment - Protocol/PCIe Analyzers, Protocol Jammers, Load Generators (Ixia, Ixchariot, Medusa tools, etc) is a plus
  • Having experience testing PCIe switch, good knowledge of PCI-E is a plus
  • Having knowledge of deep learning models - NLP, LLMs, Recommendations, Image Classification is a plus
  • Having experience with deploying BERT/LlamaV2 or relevant models and Micro benchmarking - MLPerf is a plus
  • Having experience with Docker Containers & deployment using Kubernetes/ Ansible is a plus
  • Medical, dental and vision plans
  • 401(K) participation including company matching
  • Employee Stock Purchase Program (ESPP)
  • Employee Assistance Program (EAP)
  • Company paid holidays
  • Paid sick leave and vacation time
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service