Lead Data Warehouse Engineer - Artificial Intelligence-Ready Mount Sinai (AIR.MS)

Mount Sinai Health System•New York, NY

64d

About The Position

The Scientific Computing and Data team at the Icahn School of Medicine at Mount Sinai partners with scientists and clinicians to accelerate scientific discovery. The AI-Ready.Mount Sinai (AIR.MS) platform contains patient data generated from the clinical care processes at the Mount Sinai Health System. AIR·MS is a cloud-based, high-performance SAP HANA data platform with electronic health record (EHR) data in an OMOP data format. It also contains metadata from and links to raw data sets in other modalities, such as radiology, genomics and pathology. Researchers can access the AIR.MS data through direct database access, AI agents, cohort query tools and the Minerva high-performance computer. AIR.MS ingests OHDSI’s Observational Medical Outcomes Partnership (OMOP)-formatted EHR data from the Mount Sinai Data Warehouse. AIR.MS is integrated with Minerva, a high-performance computer with >21 petaflops of raw computational power and the raw radiology, genomics and pathology data sets. An expert team of 20+ PhD/MD computational scientists, biomedical informaticists and computer scientists partner with researchers and clinicians to effectively and efficiently utilize these resources for translational science. The Lead Data Warehouse Engineer is a senior technical specialist responsible for leading the ongoing integration of multi-modal clinical data into the AIR.MS data warehouse. The Lead Data Warehouse Engineer will assess the state of data integration, identify opportunities for data integration based on researcher’s priorities, develop a plan to integrate metadata and data, and execute on the data integration. The incumbent will work collaboratively with other members of the AIR.MS data platform and Mount Sinai Data Warehouse teams to lead technical efforts for the integration of multi-modal data sets resulting in expanded AIR.MS functionality. The AIR.MS data warehouse is built on the SAP HANA technology stack and MSDW is built on MySQL.

Requirements

Bachelors degree in a technical discipline; Masters degree preferred
12-15 years preferred of related experience, including 8 years of experience designing, developing, and maintaining relational databases, data pipelines, and dimensional/OLAP warehouses.
Expert knowledge of data warehousing: 3NF & dimensional modeling (fact table types, SCDs), change data capture, incremental loads, data lineage, source-to-target mappings, pattern-based & parameter-driven development.
Expert-level experience with data engineering technologies: SQL, indexing, stored procedures, UDFs, sequences, dynamic SQL, data transformation tools, job orchestration tools for data processing.
Experience with DevOps/SDLC best practices; Agile (Scrum, Kanban) with JIRA and Confluence; version control with git.
Strong communication and customer service skills for working with researchers, clinicians, administrators, and IT staff.
Excellent critical thinking, problem-solving, multitasking, and collaboration skills; ability to work independently in a fast-paced environment.
Experience with database administration: configuration, performance tuning, partitioning, materialized views, permissions, backups & restorations.

Nice To Haves

Preferred experience with healthcare data (EHR, billing/claims, cost accounting), Epic Clarity/Caboodle, data models (OMOP, i2b2, PCORnet).

Responsibilities

Design databases and pipelines that balance functionality, performance, cost, and development time; evaluate technical options with the product manager.
Design, build, test, and maintain data pipelines that extract/capture data from source systems, transform and augment those data, and integrate it into a multi-modal data repository.
Serve as a team leader; contribute to project planning, work breakdown, dependency sequencing, and release management.
Develop and promote standards, conventions, design patterns, DevOps/SDLC best practices, and operational procedures for pipelines and warehouse maintenance.
Mentor junior engineers in data warehousing, data engineering skills, and operational support.
Design, build, and maintain data management processes, including loading flat files (csv, tsv, pipe-delimited, JSON).
Lead design sessions, code walkthroughs, peer reviews, and produce technical documentation.
Tune database objects, stored procedures, and pipelines to optimize performance and minimize compute and storage costs.
Monitor database and pipeline operations; lead troubleshooting and remediation of failures; provide occasional after-hours on-call support.
Collaborate with DBAs and system administrators on backups, performance tuning, statistics/index maintenance, and patching.
Provide high-quality customer service to researchers, clinicians, and internal partners; maintain a science-driven, customer-focused approach.
Ensure patient privacy and data security in compliance with IRB & cybersecurity policies, HIPAA, 42 CFR Part 2, NYS Article 27-F, and other regulations.
Stay current with emerging technologies to improve capabilities, efficiency, quality, or cost.
Identify improvements in procedures, technology, compliance, and data privacy/security.
Periodically assist DBAs with user provisioning, backups, restorations, capacity planning, and performance monitoring.
Perform related duties as assigned.