The Metropolitan Transportation Authority is North America's largest transportation network, serving a population of 15.3 million people across a 5,000-square-mile travel area surrounding New York City, Long Island, southeastern New York State, and Connecticut. The MTA network comprises the nation's largest bus fleet and more subway and commuter rail cars than all other U.S. transit systems combined. MTA strives to provide a safe and reliable commute, excellent customer service, and rewarding opportunities. The incumbent will help lead the team that designs, builds, tests, and delivers end-to-end, automated data pipelines over complex on-premises and off-premises platforms. They will work to extract data from multiple source systems containing structured, semi-structured, and unstructured data to make it consistent, reliable, available, and usable to colleagues across the MTA and, in support of the agency's and New York State's Open Data goals, to external stakeholders and the general public. They will use languages such as SQL, Python, and R and relational database tools such as Oracle, Postgres, and SQL Server to analyze large datasets, build new ones, and design overall data architectures. They will carefully document all work and work closely with colleagues to define needs, problem-solve, support the overall team agenda, and build relationships throughout and at all levels of the agency. They will have to be able to quickly learn the unique features, data constraints, and business needs of any part of the MTA. In addition, unlike other data engineering roles, they will support the entire downstream pipeline process and, occasionally, end-users of the data products. In general, they will have to support the MTA's strategic goals to build data systems and processes that are well-structured and sustainable.