Senior AI Data Infrastructure Engineer

XPENG•Santa Clara, CA

54d•$124,091 - $210,000

About The Position

XPENG is a leading smart technology company at the forefront of innovation, integrating advanced AI and autonomous driving technologies into its vehicles, including electric vehicles (EVs), electric vertical take-off and landing (eVTOL) aircraft, and robotics. With a strong focus on intelligent mobility, XPENG is dedicated to reshaping the future of transportation through cutting-edge R&D in AI, machine learning, and smart connectivity. As a core member of our AI Infrastructure team, you will work at the intersection of Autonomous Driving and Foundation Models. We don't just process EB-scale perception data from tens of thousands of production vehicles; we are building the high-performance Data Engine that powers our next-generation AI. Your work will directly determine how our self-driving systems "learn" from massive datasets and define the cognitive ceiling of multi-modal models in the physical world.

Requirements

Engineering Excellence: BS/MS/PhD in Computer Science or a related field, with a proven track record of building large-scale distributed systems.
Work Experience: 3-5 years of industry experience.
Programming Mastery: Proficient in Python, C++, or Java, with a deep understanding of high-performance concurrent programming and systems design.
Distributed Frameworks: Hands-on experience with at least one distributed processing framework, such as Ray and Spark.
Lakehouse Expertise: Familiarity with Data Lakehouse concepts and practical experience with technologies like Iceberg and Lance.

Nice To Haves

Experience building data warehouses for Trillion-token datasets or PB-scale multi-modal data.
Deep understanding of data access patterns in deep learning frameworks like PyTorch, DeepSpeed, or Megatron.
Practical experience with Vector Databases, automated labeling toolchains, or data-centric AI workflows.
Knowledge of storage formats optimized for AI (e.g., Parquet, Lance) and high-performance file systems.

Responsibilities

Scalable Data Pipelines: Architect and build scalable, end-to-end pipelines to automate the ingestion, cleaning, and processing of PB-scale raw data for both production autonomy and multi-modal LLMs.
Modern Lakehouse Architecture: Evolve our data storage solutions based on Apache Iceberg and Lance to implement efficient semantic indexing, metadata management, and data versioning.
Training Throughput Optimization: Deeply optimize data loading and pre-fetching strategies to ensure maximum throughput for large-scale training on 10,000+ GPU clusters.
Infrastructure Evolution: Support the seamless transition of foundation model data into actionable training sets, bridging the gap between raw vehicle logs and model-ready tokens.