Data Scientist III

Fred Hutchinson Cancer Center•Seattle, WA

16h

About The Position

The Translational Data Scientist III develops, curates, and analyzes multimodal research datasets that integrate clinical, genomic, and other translational data modalities. This role focuses on building analytically ready datasets and supporting collaborative translational research projects under the guidance of senior scientific and technical leadership. Working closely with the Translational Data Science Staff Scientist, this position contributes to data harmonization, cohort construction, and cross-domain integration using institutional data platforms and modern data engineering practices. The role emphasizes technical development, structured learning, and applied collaboration with research teams and program level efforts. This is a hands-on technical role situated at the interface of translational science, data engineering, and research collaboration. This role builds data science solutions, applying LLMs/AI to process, structure, and contextualize health data, and creating data products that are customize to the needs of our translational research programs at Fred Hutch including Clinical trials, Precision Oncology, disease-focused programs, and new data science capabilities both at Fred Hutch and across institutions via the Cancer AI Alliance. At Fred Hutchinson Cancer Center, all employees are expected to demonstrate a commitment to our values of collaboration, compassion, determination, excellence, innovation, integrity, and respect.

Requirements

Master’s or PhD degree in Bioinformatics, Statistics, Biostatistics, Mathematics, Computer Science, Physics, or equivalent required, with a minimum of two years of related experience.
Core competency in at least one of the following: genomics, natural language, image processing, medical records or claims.
Proficiency in R or Python.
Knowledge of statistical analysis, machine learning and predictive modeling.
A variety of data formats and markup languages (e.g. XML, JSON, RMarkdown).
Unix/Linux and distributed computing.
Visualization software: Shiny, Javascript, D3.
Code version control (Git, Github) and containers (Docker).
Proficiency in at least one common object-oriented programming language (e.g. Java, C++, C#).
Experience in application development, visualization, and user design.

Nice To Haves

3-5 years of related experience.
Experience working with clinical, genomic, imaging or other biomedical research data, ideally in Databricks or similar platform.
Demonstrated experience using Python, R, or SQL for data analysis and transformation.
Familiarity with structured data models and relational data environments.
Understanding of reproducible research or analytic workflows.
Ability to work collaboratively across scientific and technical teams.
Strong organizational and documentation practices.
Exposure to clinical data models such as OMOP or similar standardized healthcare data structures.
Experience working in a translational research or academic medical environment.
Familiarity with cloud-based research computing environments.
Experience supporting collaborative research projects or shared data resources.

Responsibilities

Identify and integrate disparate data sources, both internal and external, including clinical data, genomic data, imaging-derived data, and well-established, publicly available databases.
Develop and deploy machine learning algorithms, predictive models, and classification methods to advance cancer research and inform clinical decision making, applying reproducible data processing practices within cloud-based analytic environments.
Deliver novel, data-driven insights to improve outcomes in the treatment of cancer, supporting cohort definition, feature engineering, and dataset standardization.
Identify areas of growth for the data science initiative and actively engage in enhancing the breadth and reach of data science across the Fred Hutch campus.
Collaborate with faculty collaborators, researchers and clinicians to identify high-impact opportunities for data science applications, translating research questions into structured data products and tools
Manage data science projects from creation to completion, following established practices for data security, privacy, and compliance.
Communicate results to technical and non-technical audiences, contributing to documentation of datasets, assumptions, and transformation logic.