GPU Performance Architect (PPA)

Samsung•San Jose, CA

61d

About The Position

Samsung, a world leader in advanced semiconductor technology, is founded on a simple philosophy – the endless pursuit of excellence will create a better world for all. At Samsung Austin Research and Development Center (SARC) and Advanced Computing Lab (ACL), we are building a center of excellence for Intellectual Property (IP) that is applied to high-performance computing devices (mobile, automotive, and other custom market segments) consumed by millions of people around the world. Come build with us! Role and Responsibilities We are seeking a highly skilled GPU Performance Architect (PPA) to analyze, verify, and optimize the performance of our GPU system, identifying bottlenecks, prototyping and proposing solutions to improve end-to-end performance. You will be responsible for executing and analyzing performance of our GPU, including, but not limited to, shader-level analysis, pipeline optimization, system-level performance characterization, verification and prototyping, using programming languages such as C/C++ and Python, and leveraging tools such as waveform level debugging tools (e.g. Synopsys Verdi) and cycle approximate performance model. You have a strong background in GPU architecture, performance analysis, and verification, with experience in waveform level debugging and RTL debugging. You have a good understanding of the entire GPU system, with a focus on performance analysis, verification, and optimization. You develop tools to analyze performance data, identify bottlenecks, and propose solutions to improve end-to-end performance of the GPU system You do prototyping works based on performance models and RTL, analyze the results and propose GPU hardware optimizations. You debug RTL code, including simulation, waveform analysis, and identifying/correcting functional errors to ensure correct implementation of GPU architecture. You verify and validate the performance of a GPU core, ensuring it meets the required specifications You analyze system performance using various metrics, such as latency, throughput, and power consumption, to identify bottlenecks and optimization opportunities You develop and run benchmarks to characterize system performance and identify areas for improvement You stay up-to-date with the latest developments in graphics technology, including new APIs, tools, and methodologies

Requirements

5+ years of experience with a Bachelor’s degree in Computer Science/Computer Engineering/relevant technical field, or 3+ years of experience with a Master’s degree, or 1+ years of experience with a PhD
Strong background in GPU system level architecture (not limited to a specific block), performance analysis, verification, and optimization
Understanding of RTL (System Verilog/Verilog) is a must – basic to intermediate level
Experience with waveform level debugging tools (e.g., Synopsys Verdi) and RTL debugging
Proficiency in C/C++ and Python programming languages
Ability to work on the execution side, with a focus on performance analysis and optimization
Excellent analytical and problem-solving skills, with the ability to identify bottlenecks and propose solutions to improve performance.
You are a team player, with excellent communication and collaboration skills, and experience working with cross-functional teams.

Nice To Haves

Experience with prototyping GPU optimizations is preferred.
Experience with GPU profiling tools (e.g., RenderDoc, PIX, AMD RGP, Nvidia Nsight) to analyze and optimize graphics performance, power consumption, and system-level interactions
Knowledge of OpenGL, Vulkan, DX11/12
Familiarity with mobile platforms

Responsibilities

Analyze, verify, and optimize the performance of our GPU system, identifying bottlenecks, prototyping and proposing solutions to improve end-to-end performance.
Executing and analyzing performance of our GPU, including, but not limited to, shader-level analysis, pipeline optimization, system-level performance characterization, verification and prototyping, using programming languages such as C/C++ and Python, and leveraging tools such as waveform level debugging tools (e.g. Synopsys Verdi) and cycle approximate performance model.
Develop tools to analyze performance data, identify bottlenecks, and propose solutions to improve end-to-end performance of the GPU system
Do prototyping works based on performance models and RTL, analyze the results and propose GPU hardware optimizations.
Debug RTL code, including simulation, waveform analysis, and identifying/correcting functional errors to ensure correct implementation of GPU architecture.
Verify and validate the performance of a GPU core, ensuring it meets the required specifications
Analyze system performance using various metrics, such as latency, throughput, and power consumption, to identify bottlenecks and optimization opportunities
Develop and run benchmarks to characterize system performance and identify areas for improvement
Stay up-to-date with the latest developments in graphics technology, including new APIs, tools, and methodologies