Sr. Cloud AI Infrastructure Engineer

Tencent•Palo Alto, CA

51d•$145,100 - $273,200•Onsite

About The Position

This role involves conducting in-depth research into the underlying hardware logic of various AI accelerators, evaluating power-efficiency and suitability of heterogeneous architectures for Large Language Model (LLM) inference and training. It also includes designing and optimizing high-performance operator libraries for large-scale cloud computing environments, resolving latency issues in hardware scheduling, memory management, and distributed communication. The engineer will define interconnect architecture, driving virtualization, standardized access, and efficient pooling of heterogeneous computing resources in the cloud. Additionally, the role requires monitoring global trends in semiconductors and accelerators, performing feasibility studies and experimental validation for implementing emerging technologies within cloud infrastructure. Tencent is a world-leading internet and technology company that develops innovative products and services to improve the quality of life for people around the world.

Requirements

Education: Master’s or Ph.D. degree in Computer Engineering, Electronic Engineering, Microelectronics, or a related field.
Core Expertise: Expertise in GPGPU architectures or other mainstream AI accelerator architectures.
Programming & Frameworks: Proficient in parallel computing frameworks; deep understanding of low-level operator development languages (e.g., CUDA, Triton).
Network & Distributed Systems: Solid understanding of large-scale distributed systems, cluster topologies (e.g., Fat-tree, Torus), and high-performance network protocols.
Industry Insight: Familiar with the architectural evolution of global leading computing enterprises; ability to objectively analyze the technical pros/cons and engineering challenges of different architectural paths.

Nice To Haves

Experience in the application, optimization, or architectural design of ultra-large-scale accelerator clusters is preferred.
Experience in the low-level adaptation and performance tuning of mainstream deep learning frameworks (e.g., PyTorch, TensorFlow) is preferred.

Responsibilities

Architecture Research: Conduct in-depth research into the underlying hardware logic of various AI accelerators; evaluate the power-efficiency ratio and suitability of different heterogeneous architectures in the context of Large Language Model (LLM) inference and training.
Operator & Performance Optimization: Design and optimize high-performance operator libraries for large-scale cloud computing environments; resolve long-tail latency issues in hardware scheduling, memory management, and distributed communication.
Interconnect Architecture Definition: Define the interconnect architecture ; drive the virtualization, standardized access, and efficient pooling of heterogeneous computing resources in the cloud.
Technology Trend Analysis: Monitor global trends in semiconductors and accelerators; perform feasibility studies and experimental validation for the implementation of emerging technologies within cloud infrastructure.