About The Position

In this role, you will design, build, and maintain scalable data pipelines that ingest, process, and transform large volumes of crowdsourced sensor data from heterogeneous mobile and edge devices. You will develop real-time streaming systems, create simulation and replay frameworks, support algorithm experimentation at scale, and ensure data quality across highly dynamic and noisy environments. You will work closely with machine learning engineers, hardware teams, and software engineering partners to architect data flows, optimize end-to-end latency, and deliver robust, production-ready systems that support features deployed on millions of devices worldwide.

Requirements

  • 5 years of experience developing data ingestion, transformation, and validation pipelines for noisy or heterogeneous datasets
  • 5 years of experience with cloud or distributed storage systems (S3, HDFS, object stores) and columnar formats (Parquet, ORC)
  • 3 years of experience designing real-time or distributed data pipelines with stream-processing frameworks (Kafka or equivalent)
  • 3 years of experience managing large-scale telemetry or sensor-data systems

Nice To Haves

  • Graduate degree in Computer Science, Data Science, or equivalent experience.
  • Experience with IoT or mobile-sensing data, including intermittent connectivity and edge-generated telemetry
  • Experience with time-series data systems, sensor pipelines, or environmental/earth science data
  • Experience supporting simulation, replay, or regression pipelines for algorithm evaluation
  • Experience with data-quality frameworks, schema management, and observability tools

Responsibilities

  • design, build, and maintain scalable data pipelines that ingest, process, and transform large volumes of crowdsourced sensor data from heterogeneous mobile and edge devices
  • develop real-time streaming systems
  • create simulation and replay frameworks
  • support algorithm experimentation at scale
  • ensure data quality across highly dynamic and noisy environments
  • work closely with machine learning engineers, hardware teams, and software engineering partners to architect data flows
  • optimize end-to-end latency
  • deliver robust, production-ready systems that support features deployed on millions of devices worldwide
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service