Etched-posted 12 days ago
Intern
Onsite • San Jose, CA
51-100 employees

Architecture Intern - Inference Location: San Jose, CA Team: Architecture About Etched Etched is building the world’s first AI inference system purpose-built for transformers - delivering over 10x higher performance and dramatically lower cost and latency than a B200. With Etched ASICs, you can build products that would be impossible with GPUs, like real-time video generation models and extremely deep & parallel chain-of-thought reasoning agents. Backed by hundreds of millions from top-tier investors and staffed by leading engineers, Etched is redefining the infrastructure layer for the fastest growing industry in history. The Role We are seeking a talented Architecture intern to join our team and contribute to the design of next-generation AI accelerators. This role focuses on developing and optimizing compute architectures that deliver exceptional performance and efficiency for transformer workloads. You will work on cutting-edge architectural problems and performance modeling with deep cross-functional collaboration to bring innovative chip designs from concept to silicon.

  • Support porting state-of-the-art models to our architecture.
  • Help build programming abstractions and testing capabilities to rapidly iterate on model porting.
  • Assist in building, enhancing, and scaling Sohu’s runtime, including multi-node inference, intra-node execution, state management, and robust error handling.
  • Contribute to optimizing routing and communication layers using Sohu’s collectives.
  • Utilize performance profiling and debugging tools to identify bottlenecks and correctness issues.
  • Develop and leverage a deep understanding of Sohu to co-design both HW instructions and model architecture operations to maximize model performance
  • Implement high-performance software components for the Model Toolkit
  • Progress towards a Bachelor’s, Master’s, or PhD degree in computer science, computer engineering, or a related field
  • Proficiency in C++ or Rust.
  • Understanding of performance-sensitive or complex distributed software systems, e.g. Linux internals, accelerator architectures (e.g. GPUs, TPUs), Compilers, or high-speed interconnects (e.g. NVLink, InfiniBand).
  • Familiarity with PyTorch or JAX.
  • Ported applications to non-standard accelerator hardware or hardware platforms.
  • Deep knowledge of transformer model architectures and/or inference serving stacks (vLLM, SGLang, etc.)
  • Low-latency, high-performance applications using both kernel-level and user-space networking stacks.
  • Deep understanding of distributed systems concepts, algorithms, and challenges, including consensus protocols, consistency models, and communication patterns.
  • Solid grasp of Transformer architectures, particularly Mixture-of-Experts (MoE).
  • Built applications with extensive SIMD (Single Instruction, Multiple Data) optimizations for performance-critical paths.
  • Generous housing support for those relocating
  • Daily lunch and dinner in our office
  • Direct mentorship from industry leaders and world-class engineers
  • Opportunity to work on one of the most important problems of our time
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service