Master Data Management & Data Quality Intern

GitHub, Inc.
11h$32 - $84Remote

About The Position

The Revenue Operations & Data Governance team is building a Master Data Management (MDM) foundation to establish a trusted, unified view of our customers and accounts. As our MDM & Data Quality Intern, you will work directly alongside a Solution Architect to design and validate our MDM Proof of Concept, focused on a single, well-scoped entity domain (Accounts). Your work will lay the analytical and documentary groundwork that carries this initiative from exploration into a production-ready recommendation. This is a hands-on, high-impact role where your findings and deliverables will directly shape GitHub's long-term data strategy. This is a remote summer internship for 12 consecutive weeks with start dates between May18 - June 15, 2026.

Requirements

  • Currently pursuing a Master's Degree in Data Management, Information Systems, Data Analytics, or a related field, with at least one quarter/semester remaining after the internship.
  • Expected conferral date between December 2026 and August 2027.
  • Foundational understanding of data quality, data modeling, or MDM concepts through coursework or project experience.
  • Comfortable working with SQL and exploring relational or CRM datasets.

Nice To Haves

  • Familiarity with CRM data structures (Salesforce or similar) and common data quality challenges like duplicates, incomplete records, or inconsistent formatting.
  • Exposure to MDM concepts such as entity resolution, match/merge logic, or survivorship rules, even in an academic context.
  • Strong analytical thinking, able to investigate messy data, identify patterns, and form clear recommendations.
  • Excellent documentation skills, able to translate technical findings into clear, business-facing write-ups and process guides.
  • Collaborative and curious, comfortable asking questions and working within a structured mentorship model.

Responsibilities

  • Partner with the Solution Architect to audit and map Account data across source systems (CRM, billing, product), documenting field-level lineage, ownership, and quality gaps.
  • Support the design and testing of match/merge and survivorship rules for the Account entity, defining which source system wins for each attribute and why.
  • Assist in building and validating a sandbox POC Golden Record for the Account domain, including deduplication logic, confidence scoring, and a sample output dataset.
  • Measure and report on baseline data quality metrics, duplicate rates, completeness scores, and field-level accuracy, to establish a benchmark for MDM impact.
  • Document POC outcomes, key decisions, edge cases, and a clear handoff package to guide the production engineering team.
  • Develop a draft data stewardship process, including how records get flagged, reviewed, and approved, in collaboration with the Solution Architect and business stakeholders.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service