Data Engineer (Special Projects)

MLB (Job Board Only)New York, NY
5h

About The Position

The LAI (League Analytics and Infrastructure) team occupies a crucial strategic position, functioning at the nexus of modern analytics technology and operational strategy. This team is fundamentally responsible for the comprehensive design, deployment, and ongoing maintenance of the robust analytic cloud infrastructure. This infrastructure is the essential foundation that underpins and powers Major League Baseball's (MLB) advanced analytics operations, ensuring data accessibility, scalability, and performance for critical decision-making across the league. In the role of Data Engineer, the successful candidate will operate as a hands-on contributor, focused primarily on the technical execution of data-centric projects and initiatives. This position reports into a Data Engineering Lead, working closely within a dedicated sub-team of the LAI team to implement and manage the complex data pipelines and storage solutions necessary for MLB's analytical demands. Key responsibilities include, but are not limited to, developing data pipeline processes, ensuring data quality, and collaborating with other engineers and analysts to provision high-quality, reliable data sets.

Requirements

  • 1-2 years of professional data engineering experience in a production environment.
  • SQL: High-level proficiency; ability to write complex freehand SQL.
  • Python: High-level scripting ability for data processing and automation.
  • Orchestration: Experience with Airflow (DAG creation, management).
  • Cloud Native: Familiarity with Google Cloud Platform (GCP) or equivalent AWS/Azure services.
  • Experience with Infrastructure as Code (Terraform).
  • Familiarity with containerization (Docker/Kubernetes).
  • Hands-on experience with dbt (data build tool).

Nice To Haves

  • Understanding of BI platforms (Looker, Tableau) to better understand downstream needs.
  • A passion for baseball or previous experience in sports/media/entertainment.

Responsibilities

  • Pipeline Development: Build and maintain production-grade data pipelines using Airflow and dbt to orchestrate transformations within Google Cloud Platform (GCP).
  • Data Modeling: Design, build, and execute clean, reliable data models that serve as the single source of truth for downstream engineers and analysts.
  • Cloud Operations: Utilize GCP services (Pub/Sub, GCS, Dataflow, DataPlex) to handle batch and streaming data ingestion from internal and external sources.
  • CI/CD & Governance: Manage code deployment via GitHub and adhere to data governance best practices to ensure security and compliance.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Entry Level

Education Level

No Education Listed

Number of Employees

251-500 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service