About The Position

As a core member of our AI Infrastructure team, you will be responsible for building the end-to-end data pipeline for autonomous driving, covering the entire chain from onboard data upload → cloud-based preprocessing → dataset production → model training / simulation input. In autonomous driving systems, the stability and efficiency of the pipeline directly determine the speed of algorithm iteration. We look forward to building a reliable, observable, and cost-effective data pipeline that supports the daily flow of petabyte-scale sensor data.

Requirements

  • Bachelor's degree or higher in Computer Science, Software Engineering, Artificial Intelligence, or related fields.
  • 5-8+ years of experience in large-scale data processing or data platform development.
  • Proficiency in at least one programming language among Python / Go / Java.
  • Solid software engineering foundation, good coding standards, and a strong sense of code quality.
  • Hands-on project experience in at least two of the following areas: Design and development of large-scale data pipelines / ETL systems, with end-to-end experience in data cleaning, transformation, and loading.
  • Production-level experience with distributed message queues (Kafka / Pulsar / RabbitMQ), familiar with stream processing paradigms.
  • Experience with distributed data lake systems (e.g., Apache Iceberg), familiar with Iceberg's table format, partition evolution, snapshot isolation, etc., with practical performance tuning and deployment experience.
  • Experience with columnar storage formats (e.g., Lance) and related query engines, with practical application in large model training.
  • Hands-on experience using and optimizing relational databases (MySQL / PostgreSQL) and NoSQL databases (Redis / MongoDB). Understand metadata management and caching strategies.
  • Experience in performance optimization and troubleshooting for large-scale distributed systems, able to quickly locate and resolve complex performance bottlenecks.
  • Experience with Kubernetes / Docker containerization deployment.
  • Strong cross-team communication and collaboration skills, high sense of responsibility, and proactive problem-solving attitude.

Nice To Haves

  • Familiarity with closed-loop data in the embodied AI industry will be a huge plus.
  • Some understanding of the autonomous driving industry, awareness of data closed loop and data flywheel concepts, and enthusiasm for this field.
  • Experience with AI infrastructure or model training workflows (e.g., data loading, feature engineering, data preparation for model evaluation).
  • Familiarity with data lake / data warehouse systems, with practical experience implementing data version control and data lineage tracing.
  • Open-source contributions on GitHub or a technical blog, with continuous attention to the latest technological trends in big data / AI infrastructure.

Responsibilities

  • Responsible for the design and construction of core data closed loop pipelines.
  • Develop toolchains for data cleaning, annotation quality inspection, and data mining to support the algorithm team in quickly locating model error cases and driving iterative model optimization.
  • Data Support for Production and R&D Processes. This includes log event tracking, connected vehicle data, internal and external data collection, data synchronization, data cleaning and standardization, data modeling, offline and real-time data processing, data as a service, and data visualization.
  • Support business operations such as autonomous driving, smart cockpits, overseas data collection, and robotics data collection.
  • Responsible for optimizing the performance of the entire data pipeline (collection, cleaning, conversion).
  • Solve bottlenecks in large-scale data transmission, memory management, I/O, etc., and build a distributed data processing system with high throughput and low latency.
  • Responsible for building a data management platform covering the entire process from data collection to data lake ingestion to model training.
  • Implement capabilities for data version control, data lineage tracing, metadata management, and fast data retrieval to support unified data access and collaboration across multiple teams.
  • Collaborate with the large model team and other technical teams to deeply understand business requirements, respond quickly, and ensure successful implementation.

Benefits

  • A fun, supportive and engaging environment.
  • Infrastructures and computational resources to support your work.
  • Opportunity to work on cutting edge technologies with the top talents in the field.
  • Opportunity to make significant impact on the transportation revolution by the means of advancing autonomous driving.
  • Competitive compensation package.
  • Snacks, lunches, dinners, and fun activities.
  • bonus
  • equity
  • health insurance
  • dental insurance
  • vision insurance
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service