Senior Data Engineer, Revenue Intelligence

GitHub, Inc.

1d•Remote

About The Position

As a Senior Data Engineer on the Revenue Data & Analytics team at GitHub, you will build and maintain the governed data infrastructure that powers how GitHub understands its revenue. You will work at the intersection of data engineering, data modeling, and analytics engineering — designing pipelines, building dimensional models, and implementing data quality frameworks that serve as the single source of truth for Copilot and broader revenue analytics. We are building a metadata-driven, medallion-architecture governance layer built on Microsoft Fabric, dbt, and Delta Lake. This role is foundational to that effort. You will be dedicated to transforming how GitHub governs, models, and delivers revenue data at scale. We are looking for engineers who think in models, not just pipelines. People who understand that a well-governed dim_account is worth more than a thousand ad-hoc joins. If you care about data quality as much as data delivery, about SCD2 as much as SLAs, and about building things right — this is your role.

Requirements

6+ years experience in Software Engineering, Computer Science, or related technical discipline with proven experience maintaining and delivering production software coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Go, Ruby, Rust, or Python
OR Associate’s Degree in Computer Science, Electrical Engineering, Electronics Engineering, Math, Physics, Computer Engineering, Computer Science, or related field AND 5+ years experience in Software Engineering, Computer Science, or related technical discipline with proven experience maintaining and delivering production software coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Go, Ruby, Rust, or Python
OR Bachelor's Degree in Computer Science, Electrical Engineering, Electronics Engineering, Math, Physics, Computer Engineering, Computer Science, or related field AND 4+ years experience in Software Engineering, Computer Science, or related technical discipline with proven experience maintaining and delivering production software coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Go, Ruby, Rust, or Python
OR Master's Degree in Computer Science, Electrical Engineering, Electronics Engineering, Math, Physics, Computer Engineering, Computer Science, or related field AND 2+ years experience in Software Engineering, Computer Science, or related technical discipline with proven experience maintaining and delivering production software coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Go, Ruby, Rust, or Python
OR Doctorate in Computer Science, Electrical Engineering, Electronics Engineering, Math, Physics, Computer Engineering, Computer Science, or related fie
OR equivalent experience.
5+ years SQL experience.

Nice To Haves

SQL fluency — window functions, CTEs, merge statements, query optimization; you think in sets, not loops
Hands-on dbt experience (Core or Cloud) — models, tests, macros, Jinja, incremental materializations; dimensional modeling (Kimball star schemas, SCD2, conformed dimensions) a strong plus
Orchestration experience (Airflow, Prefect, Dagster, or similar) for scheduling, dependencies, and error handling
Cloud data platform experience — Azure preferred (Fabric, ADLS, Synapse); AWS/GCP transfers; familiarity with Delta Lake, Apache Iceberg, or Spark a bonus
Docker, Git-based workflows, and CI/CD for data pipelines; Python or equivalent for engineering tasks
Data quality tooling (Soda, dbt Elementary) and catalog/lineage tools (Purview, Atlan, DataHub, or similar)
Familiarity with advanced patterns — medallion architecture, Data Vault 2.0, metadata-driven frameworks, or federated query engines (Trino/Presto)
Experience with revenue, finance, or billing data — ARR, consumption models, hierarchy attribution, and account ownership complexity

Responsibilities

Design, build, and maintain dbt models across medallion layers (bronze/silver/gold) in Microsoft Fabric Lakehouse and Warehouse, following Kimball dimensional modeling patterns — including SCD2 dimensions, incremental CDC pipelines, and metadata-driven approaches to minimize code duplication
Author and enforce data quality checks and dbt tests across pipeline stages to catch anomalies before they reach downstream consumers; contribute to data cataloging and lineage to ensure governed datasets are discoverable and traceable
Develop and maintain Airflow DAGs for orchestration — scheduling, dependency management, error handling, and alerting
Containerize data workloads with Docker and deploy via GitHub Actions CI/CD pipelines, including automated testing, linting, and environment promotion (dev → staging → prod)
Manage and optimize ADLS Gen2 and Delta Lake storage — partitioning, compaction, retention policies, and cost management
Collaborate with analytics engineers, BI developers, and analysts to ensure gold-layer datasets serve Power BI, Trino, and downstream reporting needs
Participate in architecture reviews and contribute to ADRs; support migration from legacy patterns toward a governed, metadata-driven platform with pragmatism about transition paths
Own operational excellence across data pipelines — monitoring, alerting, incident response, and proactive detection of data drift, schema changes, and quality regressions

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume