Principal Engineer, GPU Performance Architect (PPA)

Samsung Electronics•San Jose, CA

2d•$279,200 - $364,800•Onsite

About The Position

Samsung, a world leader in advanced semiconductor technology, is founded on a simple philosophy – the endless pursuit of excellence will create a better world for all. At Samsung Austin Research and Development Center (SARC) and Advanced Computing Lab (ACL), we are building a center of excellence for Intellectual Property (IP) that is applied to high-performance computing devices (mobile, automotive, and other custom market segments) consumed by millions of people around the world. Come build with us! As a Principal GPU Performance Architect (PPA), you will drive the analysis, verification, and optimization of end-to-end system performance for Samsung’s premium mobile GPUs. In this high-impact and highly visible technical leadership role, you will shape performance strategy and guide architectural decisions that define the efficiency and scalability of Samsung’s GPU designs. You bring deep expertise in GPU architecture, performance analysis and verification, waveform-level and RTL debugging, with a proven record of advancing performance and power efficiency across complex systems. You are a domain expert in multiple areas. You will lead the design and execution of advanced performance modeling and analysis for GPU pipelines—including shader-level analysis, pipeline optimization, system-level characterization, and architectural prototyping. You build and refine models and tools using C/C++, Python, and cycle-approximate frameworks to analyze and validate key metrics (such as latency, throughput, and power consumption), develop benchmarks, identify bottlenecks, and propose data-driven optimizations. You ensure architectural excellence and correctness through prototyping, model-to-RTL correlation, and deep-dive validation—including simulation, waveform analysis using tools like Synopsys Verdi, functional debug, and performance verification against required specifications. You spearhead cross-functional collaboration with architecture, design, and software teams to propose, evaluate, and guide implementation of architectural improvement. You inspire high performance by mentoring engineers, fostering a culture of ownership and innovation, and staying ahead of emerging GPU technologies, tools, and performance methodologies.

Requirements

15+ years of experience with a Bachelor’s Degree in Computer Science/Engineering, or 13+ years of experience with a Master’s Degree, or 11+ years of experience with a Ph.D.
15+ years of experience of broad GPU system-level architecture (not limited to a specific block), performance analysis, verification, and optimization.
In-depth expertise in RTL (System Verilog/Verilog).
Strong experience with waveform level debugging tools (e.g., Synopsys Verdi) and RTL debugging.
Strong programming skills in in C/C++ and Python.
Proven experience with leading performance strategy, with a focus on performance modeling, analysis, verification, and optimization.
Excellent analytical and problem-solving skills, with the ability to identify bottlenecks, propose solutions and guide teams through implementation.
Excellent communication and collaboration skills, with the ability to navigate ambiguity and influence in a fast-paced, global team environment.

Nice To Haves

Experience with prototyping GPU optimizations.
Experience with GPU profiling tools (e.g., RenderDoc, PIX, AMD RGP, Nvidia Nsight) to analyze and optimize graphics performance, power consumption, and system-level interactions.
Knowledge of OpenGL, Vulkan, DX11/12.
Experience with mobile platforms.

Responsibilities

Drive the analysis, verification, and optimization of end-to-end system performance for Samsung’s premium mobile GPUs.
Shape performance strategy and guide architectural decisions that define the efficiency and scalability of Samsung’s GPU designs.
Lead the design and execution of advanced performance modeling and analysis for GPU pipelines—including shader-level analysis, pipeline optimization, system-level characterization, and architectural prototyping.
Build and refine models and tools using C/C++, Python, and cycle-approximate frameworks to analyze and validate key metrics (such as latency, throughput, and power consumption), develop benchmarks, identify bottlenecks, and propose data-driven optimizations.
Ensure architectural excellence and correctness through prototyping, model-to-RTL correlation, and deep-dive validation—including simulation, waveform analysis using tools like Synopsys Verdi, functional debug, and performance verification against required specifications.
Spearhead cross-functional collaboration with architecture, design, and software teams to propose, evaluate, and guide implementation of architectural improvement.
Inspire high performance by mentoring engineers, fostering a culture of ownership and innovation, and staying ahead of emerging GPU technologies, tools, and performance methodologies.