Performance Architect

Sandisk•Milpitas, CA

About The Position

In this position, you will develop High Bandwidth Flash (HBF) based advanced system architectures and complex simulation models for Sandisk’s next generation products. You will need to initiate and analyze changes to the architecture of the product. Typical activities include designing, programming, debugging, and modifying simulation models to evaluate these changes and assess the performance, power, and endurance of the product. You will work closely with excellent colleague engineers, cope with complex challenges, innovate, and develop products that will change the data centric architecture paradigm.

Requirements

Bachelors or Masters or PhD in Computer/Electrical Engineering with 3+ years of relevant experience in Performance Modeling, Simulation, and Analysis using SystemC

Nice To Haves

At least 3+ years of experience with SystemC modeling
Deep experience optimizing large-scale ML systems, GPU architectures
Good understanding of computer/graphics architecture, ML, LLM
Strong track record of technical leadership in GPU performance and workload analysis
Expert knowledge of transformer architectures, attention mechanisms, and model parallelism techniques
Experience with GPU or TPU and system microarchitecture
Proficiency in principles and methods of microarchitecture, software, and hardware relevant to performance engineering
Capable of developing wide system view for complex AI/ML Accelerator ASIC systems
Proficiency with SoC and system performance analysis fundamentals, tools, and techniques including hardware performance monitors and PERF profiling
Familiar with IO subsystem microarchitecture performance modeling and background in NVMe/PCIe//UCIe/CXL/NVLink microarchitecture and protocols is a plus
Experience of simulation using System C and TLM, behavioral modeling and performance analysis - advantage
Multi-disciplinary experience, including familiarity with Firmware and ASIC design
PyTorch, CUDA, TensorRT, OpenAI Triton, and ONNX
Distributed systems: Ray, Megatron-LM
Performance analysis tools: NSight Compute, nvprof, PyTorch Profiler
KV cache optimization, Flash Attention, Mixture of Experts
High-speed networking: InfiniBand, RDMA, NVLink
Expertise in CUDA programming, GPU memory hierarchies, and hardware-specific optimizations
Proven track record architecting distributed training systems handling large scale systems
Previous experience with storage systems, protocols, and NAND flash – advantage
Experience with datacenter and AI workload analysis and optimization
Experience with multi-core systems and multi-thread interactions
Experience analyzing and optimizing workloads

Responsibilities

Build SystemC performance models for HBF based products covering end-to-end from GPU/TPU/NPU/xPU, host interface, memory hierarchy, basedie controller, and HBF using various packaging technolgies
Responsible for improving the AI/ML ASIC Architecture performance through hardware & software co-optimization, post-silicon performance analysis, and influencing the strategic product roadmap.
Workload analysis and characterization of ASIC and competitive datacenter and AI solutions to identify opportunities for performance improvement in our products.
Collaboration with Architecture team to resolve performance issues and optimize the performance and TCO of their HBF based datacenter technologies.
Experience modeling one or some components of AI/ML accelerator ASICs such as HBM, PCIe/UCIe/CXL, NoC, DMA, Firmware Interactions, NAND, xPU, fabrics, etc
Performance modeling and optimization for multi-trillion parameter LLM training/inference including Dense, Mixture of Experts (MoE) with multiple modalities (text, vision, speech)
Model/optimize novel parallelization strategies across tensor, pipeline, context, expert and data parallel dimensions
Architect memory-efficient training systems utilizing techniques like structured pruning, quantization (MX formats), continuous batching/chunked prefill, speculative decoding
Incorporate and extend SOTA models such as GPT-4, Reasoning models like Deepseek-R1, and multi-modal architectures
Collaborate with internal and external stakeholders/ML researchers to disseminate results and iterate at rapid pace
In the HBF Performance Architecture Group, we build on our depth in microarchitecture expertise and simulation to analyze and optimize high-performance ASIC designs for critical areas such as AI/MLAccelerators, cloud computing, and high-performance computing.

Benefits

paid vacation time
paid sick leave
medical/dental/vision insurance
life, accident and disability insurance
tax-advantaged flexible spending and health savings accounts
employee assistance program
other voluntary benefit programs such as supplemental life and AD&D, legal plan, pet insurance, critical illness, accident and hospital indemnity
tuition reimbursement
transit
the Applause Program
employee stock purchase plan
the Sandisk's Savings 401(k) Plan