About The Position

Design and implement end-to-end data pipelines (ETL/ELT) that ingest, process, and curate large-scale enterprise data, including telemetry/vehicle data and other structured/unstructured sources. Migrate and modernize data assets to a centralized data platform (e.g., BigQuery) using principled data lake/warehouse architectures (Bronze/Silver/Gold or Medallion architecture) to power analytics and reporting. Architect scalable data models and data warehouses, optimizing for query performance, maintainability, and cost efficiency. Develop and operate robust orchestration pipelines using Airflow/Astronomer or Schedule Query, with secure, reproducible CI/CD workflows (Terraform + Git). Build and maintain reliable data quality checks, lineage, and monitoring with observability tools (e.g., Splunk, Looker/Grafana/Tableau/Power BI dashboards) to rapidly detect and resolve data issues. Implement data governance, security, and compliance controls (data lineage, access controls, PII/PHI protection) in collaboration with security and privacy teams. Lead the design and delivery of analytics-ready data assets for cross-functional teams, including dashboards, alerts, and self-service analytics. Mentor and coach junior engineers, review code, and drive best practices in data engineering, testing, and documentation. Collaborate with data scientists, product managers, and business stakeholders to translate requirements into scalable data solutions and timely insights. Monitor cost and capacity planning for cloud resources; optimize storage and compute usage across GCP services (BigQuery, Dataflow, Dataproc, GCS). Participate in on-call rotations and incident response to maintain high availability of data services. Established and active employee resource groups

Requirements

  • Master's Degree in Computer Science
  • 7+ years of experience in data engineering, data platforms, or a similar role.
  • 4+ years of hands-on experience with Google Cloud Platform (BigQuery, Cloud Storage, Dataflow, Dataproc; Schedule Query or equivalent scheduling/orchestration) or AWS.
  • 4+ years experience in Python and SQL; strong experience with PySpark is a plus.
  • 4+ years experience with ETL/ELT design, data modeling, data warehousing, and data governance.
  • Practical experience building and operating data pipelines with orchestration tools (Airflow/Astronomer; Schedule Query).
  • Experience with infrastructure-as-code and CI/CD (Terraform, Git, and related tooling).
  • Demonstrated ability to design and implement analytics-ready data assets and dashboards; familiarity with BI tools (Looker, Tableau, Power BI, Grafana) for monitoring and reporting.
  • Strong communication skills and ability to work effectively with cross-functional teams (engineering, analytics, product, security).

Responsibilities

  • Design and implement end-to-end data pipelines (ETL/ELT)
  • Migrate and modernize data assets to a centralized data platform
  • Architect scalable data models and data warehouses
  • Develop and operate robust orchestration pipelines using Airflow/Astronomer or Schedule Query
  • Build and maintain reliable data quality checks, lineage, and monitoring with observability tools
  • Implement data governance, security, and compliance controls
  • Lead the design and delivery of analytics-ready data assets for cross-functional teams
  • Mentor and coach junior engineers, review code, and drive best practices in data engineering, testing, and documentation
  • Collaborate with data scientists, product managers, and business stakeholders to translate requirements into scalable data solutions and timely insights
  • Monitor cost and capacity planning for cloud resources; optimize storage and compute usage across GCP services
  • Participate in on-call rotations and incident response to maintain high availability of data services
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service