Data Engineer, League Analytics & Infrastructure

Major League BaseballNew York, NY
$115,000 - $140,000

About The Position

Major League Baseball is scaling the data platform that powers America's pastime — and we need a builder to help us do it. The League Analytics & Infrastructure (LAI) team operates the cloud foundation behind every analytics decision at MLB from Statcast player-tracking pipelines processing millions of pitch-level events to the baseball operations data that informs decisions for 30 Clubs and the Commissioner's Office. We are continuing a multi-year evolution of our GCP-native lakehouse — building out our dbt transformation layer, hardening our Airflow orchestration, and pushing more of our infrastructure into code. You will be a core contributor to that build. The systems you create will be the backbone of analytics products used across the league, and your work will be visible to engineers, analysts, and decision-makers at every level of the organization. This is a hands-on data engineering role focused on execution. Reporting to the Manager of BI Data Engineering, you'll join a small, high-performing team within LAI that values careful craftsmanship, rolls up its sleeves, and treats data as a product rather than a byproduct. You'll work upstream of our analytics engineers and analysts — designing the pipelines, models, and infrastructure they rely on every day. We are looking for someone with production experience who can hit the ground running, but also someone who's hungry to grow into the next level. "Delivering" is the aim of the game: shipping reliable pipelines, optimizing data models for the analysts and engineers downstream, and making the platform a little better with every pull request. Beyond that, we want someone who reads the codebase critically, asks why something was built a certain way, and brings ideas — about tooling, architecture, or process — that push the team forward. The standards are high, the autonomy is real, and the work is visible across the league.

Requirements

  • 2–4 years of production data engineering experience
  • Expert-level SQL — comfortable writing complex freehand queries (sub-queries, nested logic, window functions) and reading someone else's to spot issues
  • Strong Python for data processing, scripting, and automation
  • Hands-on dbt experience — you've built models across staging, intermediate, and mart layers, written tests, and shipped to production
  • Production Airflow experience — DAG authoring, dependency management, debugging failed runs
  • Deep familiarity with Google Cloud Platform (BigQuery, GCS, Pub/Sub) or equivalent depth in AWS/Azure with willingness to convert
  • Git-based development workflows — branches, PRs, code review as a daily practice
  • You communicate clearly with both engineers and non-engineers, take feedback well, and give it kindly
  • Execution mindset. You can own a project from requirements to deployment with minimal oversight.

Nice To Haves

  • A degree in Computer Science, Engineering, or a related field — or non-traditional background with equivalent practical experience
  • Experience with Terraform or other Infrastructure-as-Code tools
  • Experience with AI-assisted development or enterprise AI tooling (Gemini Enterprise, Vertex AI). We're early but ambitious — we see AI as a lever for engineering efficiency
  • A passion for baseball, or prior experience in sports, media, or entertainment
  • Ability to build creative solutions for unusual problems

Responsibilities

  • Build production-grade pipelines using Airflow and dbt to orchestrate batch and streaming transformations across GCP, so that downstream analysts and engineers can trust the data they query without checking the wiring
  • Architect clean, layered data models (staging intermediate mart) that serve as the single source of truth for league analytics, applying dbt best practices for materialization, testing, and documentation
  • Operate the ingestion layer using Pub/Sub, GCS, Dataflow, and Knowledge Catalog DataPlex) to land both batch and streaming sources cleanly into the lakehouse
  • Implement observability and monitoring standards so that data quality issues surface before stakeholders notice them, not after
  • Manage code through GitHub-based CI/CD, contributing to the deployment workflows that keep our platform reliable and our changes safe
  • Adhere to data governance practices that keep proprietary baseball data secure and compliant

Benefits

  • Competitive Benefits Package
  • Company Contributed 401K Plan
  • Paid Time Off and Holidays
  • Paid Parental Leave
  • Access to Free Tickets to Baseball Games & MLB.TV
  • Discounts at MLB Store | MLBShop.com
  • Employee Assistance Programs (EAP)
  • Onsite/Online Training & Development Programs
  • Tuition Reimbursement
  • Disability Benefits (short term and long term)
  • Life and Accidental Death Insurance
  • Pet Insurance
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service