Data Delivery Specialist (Clinical & Multimodal Data Integration)

Roche•South San Francisco, CA

1d•Hybrid

About The Position

Advances in AI, data and computational sciences are transforming drug discovery and development. Roche’s Research and Early Development organizations at Genentech (gRED) and Pharma (pRED) have demonstrated how these technologies accelerate R&D, leveraging data and novel computational models to drive impact. Seamless data sharing and access to models across gRED and pRED are essential to maximising these opportunities. The Computational Sciences Center of Excellence (CS CoE) is a strategic, unified group whose goal is to harness the transformative power of data and Artificial Intelligence (AI) to assist our scientists in both pRED and gRED to deliver more innovative and life-changing medicines for patients worldwide. The Computational Sciences Center of Excellence (CS CoE) brings together data, AI, and computational expertise to accelerate innovation across gRED and pRED. Within CS CoE, the Data and Digital Catalyst (DDC) organization leads the modernization of our data ecosystem, enabling scalable, data-driven science. The Data Capability organization within DDC is responsible for establishing foundational data capabilities, including data connectivity, data compliance, scientific content management and data ingestion, curation, integration, and delivery. The team ensures that high-quality, well-structured datasets are available to power analytics, AI/ML, and scientific discovery across Research and Early Development. We are seeking an Associate Data Delivery Specialist to support the integration and delivery of clinically anchored, multimodal datasets across sequencing, imaging, and proteomics domains. In this entry-level role, you will contribute to the ingestion, harmonization, and preparation of clinical data linked with diverse scientific modalities, enabling high-quality datasets for downstream analytics, AI/ML workflows, and translational research. Your work will directly support foundational data assets used in biological discovery and model development. You will work within a cross-functional team of data engineers, analysts, and scientists to ensure that data is standardized, metadata-rich, and analysis-ready, forming a critical foundation for modern data-driven R&D.

Requirements

Bachelor’s or Master’s degree in Bioinformatics, Data Science, Biomedical Engineering, Computer Science, Clinical Sciences, or a related field and 0–2 years of experience working with clinical, biomedical, or scientific data.
Foundational knowledge of clinical data structures, including patient-level and longitudinal datasets.
Detail-oriented with a strong focus on data quality and consistency and are motivated to learn and grow in a data-intensive, scientific environment.
Programming: Python (Pandas), SQL; familiarity with Bash is a plus.
Data Formats: Experience with structured data (CSV, JSON, Parquet); exposure to scientific formats (e.g., FASTQ, VCF, DICOM) is a plus.
Data Platforms: Exposure to cloud storage environments such as AWS S3 or Google Cloud Storage.
Tools: Familiarity with Jupyter notebooks and basic workflow tools (e.g., Airflow) is beneficial.

Nice To Haves

Exposure to clinical data standards such as CDISC (SDTM/ADaM), OMOP, or FHIR.
Experience or coursework involving multimodal datasets (e.g., clinical + omics or imaging).
Familiarity with metadata standards, ontologies, or controlled vocabularies.
Basic understanding of AI/ML data requirements and workflows.
Interest in applying data to translational research and drug discovery.

Responsibilities

Assist in the ingestion, validation, and harmonization of clinical datasets, including patient demographics, longitudinal records, and outcomes data.
Apply data standards and controlled vocabularies to improve consistency and usability.
Support the integration of clinical data with sequencing, imaging, and proteomics datasets to create coherent, analysis-ready multimodal datasets.
Prepare, validate, and document datasets for delivery to internal stakeholders across research, bioinformatics, and data science teams.
Perform quality control checks and support issue resolution in data workflows.
Assist in structuring datasets and metadata to support downstream analytics and machine learning use cases.
Contribute to early-stage AI-enabled data curation and harmonization efforts.
Work closely with data engineers, data scientists, and domain experts to support data integration efforts and ensure alignment with scientific and technical requirements.

Benefits

A discretionary annual bonus may be available based on individual and Company performance.
This position also qualifies for the benefits detailed at the link provided below.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume