Data Engineer

SumerSports
44dRemote

About The Position

As a Data Engineer, you’ll design, build, and maintain the data pipelines that power our deep learning and LLM systems. You’ll work across ingestion, transformation, and orchestration layers — from real-time feeds to analytics-ready datasets. Your mission is to make data reliable, discoverable, and scalable for use by model training, analytics, and AI-driven products across multiple sports. You’ll collaborate closely with our MLOps, LLMOps, and Sports Data teams to ensure seamless integration between data and AI.

Requirements

  • 3–6 years of experience as a Data Engineer or ETL Developer in a production environment.
  • Proficiency in Python and SQL; strong familiarity with Databricks, Spark, or equivalent big-data frameworks.
  • Experience with workflow orchestration tools such as Airflow, Dagster, Luigi or Prefect.
  • Deep understanding of data modeling, data warehousing, and distributed data processing.
  • Knowledge of modern data lakehouse architectures (Delta, Parquet, Iceberg).
  • Familiarity with CI/CD, GitHub Actions, and data pipeline testing frameworks.
  • Comfort working in a cross-functional environment with ML, product, and analytics teams.

Nice To Haves

  • Experience with sports, telemetry, or sensor data pipelines.
  • Familiarity with streaming frameworks (Kafka, Spark Structured Streaming, Flink).
  • General knowledge of American football, the NFL, and college football
  • Background in data governance, lineage, and observability tools (Monte Carlo, Great Expectations, Unity Catalog, OpenLineage).
  • Experience with cloud infrastructure (AWS, GCP, or Azure) and containerization (Docker, Kubernetes).
  • Exposure to best practices in machine-learning model management and MLOps

Responsibilities

  • Build and operate robust data pipelines for ingestion, cleaning, and transformation using Databricks, Airflow, or Dagster.
  • Develop efficient ETL/ELT workflows in Python and SQL to support both batch and streaming workloads.
  • Collaborate with ML and AI teams to deliver high-quality datasets for training, evaluation, and production features.
  • Model and maintain structured data assets (Delta, Parquet, Iceberg) for reliability, versioning, and lineage tracking.
  • Implement orchestration and monitoring — schedule jobs, track dependencies, and automate recovery from failures.
  • Ensure data quality and compliance through validation frameworks, schema enforcement, and audit logging.
  • Contribute to data platform evolution — evaluate tools, standardize best practices, and improve developer experience.
  • Support performance and cost optimization across compute, storage, and orchestration systems.

Benefits

  • Competitive Salary and Bonus Plan
  • Comprehensive health insurance plan
  • Retirement savings plan (401k) with company match
  • Remote working environment
  • A flexible, unlimited time off policy
  • Generous paid holiday schedule - 13 in total including Monday after the Super Bowl
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service