About The Position

XPENG is a leading smart technology company at the forefront of innovation, integrating advanced AI and autonomous driving technologies into its vehicles, including electric vehicles (EVs), electric vertical take-off and landing (eVTOL) aircraft, and robotics. With a strong focus on intelligent mobility, XPENG is dedicated to reshaping the future of transportation through cutting-edge R&D in AI, machine learning, and smart connectivity. We are looking for a versatile Machine Learning Infrastructure Engineer to join XPeng’s Fuyao AI Platform team — a core AI infrastructure powering autonomous driving, robotics, and intelligent cockpit applications. You will build and optimize next-generation AI infrastructure, spanning dataloader, dataset and data production systems, large-scale inference, and distributed compute platforms — with a strong focus on efficiency, scalability, and reliability.

Requirements

  • Master’s degree in Computer Science, Software Engineering, or equivalent experience.
  • 5+ years of experience in large-scale data processing or ML infrastructure.
  • Proficient in Python with solid software engineering fundamentals, clean coding practices, and strong debugging skills.
  • Hands-on experience with relational databases and NoSQL systems, including metadata and cache management; prior experience with large-scale VectorDB is highly desirable.
  • Familiarity with Linux file systems and network I/O optimization for distributed or object storage.
  • Strong communication skills and ability to work cross-functionally in fast-paced environments.
  • Strong ability to learn quickly, adapt to new challenges, and proactively explore and adopt new technologies.

Nice To Haves

  • Familiarity with the autonomous driving industry and enthusiasm for its challenges.
  • Experience with distributed computing frameworks such as Ray, Flink or Spark.
  • Experience in building and scaling ML infrastructure in cloud-native environments.
  • Large-scale deep learning training or inference optimization focused on scalability and model acceleration.
  • Columnar storage formats (Parquet/ORC) and related ecosystems, including partitioning, compression, and vectorized I/O optimization.
  • Large-scale data loading frameworks (PyTorch Dataloader, Hugging Face Datasets).

Responsibilities

  • Design and optimize large-scale data processing, production and loading pipelines, supporting heterogeneous data types (images, videos, point clouds, sensor streams, etc.).
  • Build and maintain high-performance dataset management and loading frameworks, ensuring low-latency, high-throughput pipelines for training and inference.
  • Develop and optimize distributed compute and inference systems, including scheduling, resource utilization, and performance tuning.
  • Collaborate with cross-functional teams (e.g. Algorithms, Data Lakehouse) to translate requirements into production-ready infrastructure solutions.
  • Continuously monitor, profile, and eliminate bottlenecks across AI data, inference and compute stack.

Benefits

  • bonus
  • equity
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service