Senior Data Engineer - Digital Pathology

Mayo Clinic•Rochester, MN

10h

About The Position

The Digital Biology team is the advanced technology group for Mayo Clinic Digital Pathology. We are seeking a Senior Data Engineer to execute the technical vision for our shared engineering pod. In this role, you will build, deploy, and optimize the scalable, multimodal data pipelines (pathology, -omics, imaging) that feed our biological foundation models and AI Virtual Cells. Working directly with AI pods and bioinformaticians, you will take ownership of data reliability and velocity, transforming complex raw biological information into high-quality training assets used to develop life-changing diagnostic tools. Develops and deploys data pipelines, integrations and transformations to support analytics and machine learning applications and solutions as part of an assigned product team using various open-source programming languages and vended software to meet the desired design functionality for products and programs. The position requires maintaining an understanding of the organization's current solutions, coding languages, tools, and regularly requires the application of independent judgment. May provide consultative services to departments/divisions and leadership committees. Demonstrated experience in designing, building, and installing data systems and how they are applied to the Department of Data & Analytics technology framework is required. Candidate will partner with product owners and Analytics and Machine Learning delivery teams to identify and retrieve data, conduct exploratory analysis, pipeline and transform data to help identify and visualize trends, build and validate analytical models, and translate qualitative and quantitative assessments into actionable insights.

Requirements

A Bachelor's degree in a relevant field such as engineering, mathematics, computer science, information technology, health science, or other analytical/quantitative field and a minimum of five years of professional or research experience in data visualization, data engineering, analytical modeling techniques; OR an Associate’s degree in a relevant field such as engineering, mathematics, computer science, information technology, health science, or other analytical/quantitative field and a minimum of seven years of professional or research experience in data visualization, data engineering, analytical modeling techniques. In-depth business or practice knowledge will also be considered.
Incumbent must have the ability to manage a varied workload of projects with multiple priorities and stay current on healthcare trends and enterprise changes.
Interpersonal skills, time management skills, and demonstrated experience working on cross functional teams are required.
Requires strong analytical skills and the ability to identify and recommend solutions and a commitment to customer service.
The position requires excellent verbal and written communication skills, attention to detail, and a high capacity for learning and problem resolution.
Advanced experience in SQL is required.
Strong Experience in scripting languages such as Python, JavaScript, PHP, C++ or Java & API integration is required.
Experience in hybrid data processing methods (batch and streaming) such as Apache Spark, Hive, Pig, Kafka is required.
Experience with big data, statistics, and machine learning is required.
The ability to navigate linux and windows operating systems is required.
Demonstrated experience in designing, building, and installing data systems and how they are applied to the Department of Data & Analytics technology framework is required.

Nice To Haves

Knowledge of workflow scheduling (Apache Airflow Google Composer), Infrastructure as code (Kubernetes, Docker) CI/CD (Jenkins, Github Actions) is preferred.
Experience in DataOps/DevOps and agile methodologies is preferred.
Experience with hybrid data virtualization such as Denodo is preferred.
Working knowledge of Tableau, Power BI, SAS, ThoughtSpot, DASH, d3, React, Snowflake, SSIS, and Google Big Query is preferred.
Google Cloud Platform (GCP) certification is preferred
The preferred candidate will have experience in: SQL Python Google Cloud Dataflow (Apache Beam) Google Cloud BigQuery
The preferred candidate will also have the GCP Professional Data Engineer Certification

Responsibilities

Develops and deploys data pipelines, integrations and transformations to support analytics and machine learning applications and solutions
Maintain an understanding of the organization's current solutions, coding languages, tools
Provide consultative services to departments/divisions and leadership committees
Partner with product owners and Analytics and Machine Learning delivery teams to identify and retrieve data, conduct exploratory analysis, pipeline and transform data to help identify and visualize trends, build and validate analytical models, and translate qualitative and quantitative assessments into actionable insights.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume