Principal engineer, AI Serving Framework Architect (Software)

Samsung SemiconductorSan Jose, CA
3hOnsite

About The Position

The Architecture Research Lab (ARL) focuses on addressing fundamental system-level bottlenecks in modern AI, particularly in memory capacity/bandwidth and system-scale communication. By leveraging Samsung’s world-class memory technologies, ARL explores and defines next-generation AI system architectures that deliver step-function improvements in performance, efficiency, and scalability. We are seeking a Principal AI System Architect who will play a key role in bridging AI workloads, system architecture, and hardware design. In this role, you will develop system-level performance models, drive architecture-level design decisions, and propose forward-looking AI system architectures that shape Samsung’s long-term AI platform strategy.

Requirements

  • PhD in Computer Science or a related field with 15+ years of experience in AI Serving Framework for large-scale computing, with focusing on the AI workloads.
  • Led a project to build and optimize a Large Language Model (LLM) Inference Software Stack on a multi-rack scale system to deliver AI Inference services to over 100,000 users.
  • Extensive experience in designing AI Inference Software Stacks for heterogeneous devices.
  • In-depth understanding of the internal architecture and operation mechanisms of inference engines such as vLLM.
  • Proficiency in AI Inference System Profiling and optimization.
  • Knowledge and practical experience with future AI workloads, including reasoning models, multi-modal solutions, AI agents, and world models.
  • Strong understanding of compute, memory, and networking bottlenecks in AI systems.
  • Required skillsets: PyTorch, Python, and C++
  • A collaborative mindset, curiosity, and resilience in solving complex challenges.
  • Excellent verbal, presentation, and written communication skills.
  • You’re inclusive, adapting your style to the situation and diverse global norms of our people.
  • You approach challenges with curiosity and resilience, seeking data to help build. Understanding. You’re collaborative, building relationships, humbly offering support and openly welcoming approaches.
  • Innovative and creative, you proactively explore new ideas and adapt quickly to change

Nice To Haves

  • Native or fluent Korean speakers are preferred.

Responsibilities

  • As a Tech Lead, leading research teams in Korea and proposing technical direction
  • Research on dynamic scheduling methodologies for maximizing AI inference performance in multi-rack scale memory-centric systems, comprised of heterogeneous compute-capable memory and hierarchical memory
  • Investigating methods to accelerate search operations in RAG’s vector DB and AI Agent’s knowledge-graph by leveraging compute-capable memory
  • Studying strategies for optimally placing KVCache and a vector DB in hierarchical memory to minimize frequent SSD accesses and reduce IO stalls
  • Proposing SW design for implementing the derived optimization algorithms on open-source platforms such as vLLM

Benefits

  • Give Back With a charitable giving match and frequent opportunities to get involved, we take an active role in supporting the community.
  • Enjoy Time Away You’ll start with 4+ weeks of paid time off a year, plus holidays and sick leave, to rest and recharge.
  • Care for Family Whatever family means to you, we want to support you along the way—including a stipend for fertility care or adoption, medical travel support, and virtual vet care for your fur babies.
  • Prioritize Emotional Wellness With on-demand apps and free confidential therapy sessions, you’ll have support no matter where you are.
  • Stay Fit Eating well and being active are important parts of a healthy life. Our onsite Café and gym, plus virtual classes, make it easier.
  • Embrace Flexibility Benefits are best when you have the space to use them. That’s why we facilitate a flexible environment so you can find the right balance for you.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service