Senior Staff AI Software Engineer

Samsung Semiconductor•San Jose, CA

10d•Onsite

About The Position

The AGI (Artificial General Intelligence) Computing Lab is dedicated to solving the complex system-level challenges posed by the growing demands of future AI/ML workloads. Our team is committed to designing and developing scalable platforms that can effectively handle the computational and memory requirements of these workloads while minimizing energy consumption and maximizing performance. To achieve this goal, we collaborate closely with both hardware and software engineers to identify and address the unique challenges posed by AI/ML workloads and to explore new computing abstractions that can provide a better balance between the hardware and software components of our systems. Additionally, we continuously conduct research and development in emerging technologies and trends across memory, computing, interconnect, and AI/ML, ensuring that our platforms are always equipped to handle the most demanding workloads of the future. By working together as a dedicated and passionate team, we aim to revolutionize the way AI/ML applications are deployed and executed, ultimately contributing to the advancement of AGI in an affordable and sustainable manner. Join us in our passion to shape the future of computing!

Requirements

Bachelor’s with 15+ years, or Master’s with 13+ years, or PhD's with 10+ years of industry experience.
Strong experience writing high-performance AI framework software development for GPUs or other accelerators.
Strong, end-to-end understanding of the AI infrastructure, AI software stack, from model definition through deployment and serving.
Solid understanding of LLM model architectures and workflows, including modern transformer-based designs.
Solid understanding of agentic AI architecture and workflows.
Hands-on expertise with the PyTorch framework.
Practical experience with vLLM for high-throughput model inference and serving.
Solid understanding of the memory wall problem and its impact on AI system performance.
Strong knowledge of memory architecture, including High Bandwidth Memory (HBM), and familiarity with memory-centric acceleration and compute approaches.
Proficiency working in a Linux development environment.
Solid command of development tooling, including agentic coding, GitHub and Jira.

Responsibilities

Lead the co-design of software and hardware solutions that optimize AI model inference performance, with a focus on overcoming memory bottlenecks.
Analyze and optimize LLM and agentic AI workloads across the full software stack, identifying opportunities for hardware-aware acceleration.
Profile and characterize model execution to expose memory wall limitations and guide architectural decisions for HBM and memory-centric compute.
Collaborate with hardware teams to influence memory architecture, acceleration strategies, and compute placement based on real workload behavior.
Develop, optimize, and benchmark inference and serving solutions using frameworks such as PyTorch and vLLM.
Define best practices and provide technical mentorship across software–hardware co-design efforts.