Sr. Data Delivery Specialist (Public and purchased data collection)

Roche•South San Francisco, CA

6d•Hybrid

About The Position

Advances in AI, data, and computational sciences are transforming drug discovery and development. Roche’s Research and Early Development organisations at Genentech (gRED) and Pharma (pRED) have demonstrated how these technologies accelerate R&D, leveraging data and novel computational models to drive impact. Seamless data sharing and access to models across gRED and pRED are essential to maximising these opportunities. The new computational sciences Center of Excellence (CoE) is a strategic, unified group whose goal is to harness this transformative power of data and Artificial Intelligence (AI) to assist our scientists in both pRED and gRED to deliver more innovative and transformative medicines for patients worldwide. The Computational Sciences Center of Excellence (CS CoE) brings together data, AI, and computational expertise to accelerate innovation across gRED and pRED. Within CS CoE, the Data and Digital Catalyst (DDC) organization leads the modernization of our data ecosystem, enabling scalable, data-driven science. The Data Capability organization within DDC is responsible for establishing foundational data capabilities, including data connectivity, data compliance, scientific content management and data ingestion, curation, integration, and delivery. The team ensures that high-quality, well-structured datasets are available to power analytics, AI/ML, and scientific discovery across Research and Early Development. We are seeking an Associate Data Delivery Specialist to support the delivery and operationalization of real-world data (RWD) and clinical-genomic datasets sourced from external partnerships and public/purchased data collections. In this entry-level role, you will contribute to the coordination, preparation, and delivery of multimodal, high-dimensional datasets, ensuring they are accessible, well-documented, and ready for use in research, analytics, and AI/ML workflows. You will also support interactions with external data providers and internal stakeholders to ensure efficient and compliant data usage. You will work within a cross-functional environment spanning data engineering, data science, and research teams, helping to enable data-driven discovery across Roche’s R&D ecosystem.

Requirements

PhD and 0-2 years of experience, Master’s degree and 3-5 years of experience or a Bachelor’s degree and 4-7 years of experience in Data Science, Bioinformatics, Health Informatics, Biomedical Engineering, Computer Science, or a related field and experience working with real-world data, clinical data, or biomedical datasets
Basic understanding of RWD sources (e.g., EHR, claims, registries, clinical-genomic datasets)
Strong attention to detail and commitment to data quality and reliability
Strong organizational and communication skills, with the ability to support multiple stakeholders
Programming: Python (Pandas) or SQL; familiarity with Bash is a plus.
Data Formats: Experience with structured data (CSV, JSON, Parquet); exposure to scientific formats is a plus.
Data Platforms: Exposure to cloud environments (AWS S3, GCS, or Azure).
Tools: Familiarity with Jupyter notebooks, data portals, or workflow tools is beneficial

Nice To Haves

Exposure to clinical-genomic or multimodal datasets (e.g., Caris, FMI, or similar)
Familiarity with data governance and compliance in healthcare or life sciences
Exposure to AI/ML workflows or data preparation for analytics
Understanding of FAIR data principles and metadata standards
Interest in working with external data partnerships and large-scale data ecosystems

Responsibilities

Intake, tracking, and fulfillment of real-world data requests, including clinical-genomic and multimodal datasets.
Assist in preparing datasets for delivery, ensuring completeness, quality, and documentation.
Coordinate with external partners (e.g., Caris, FMI) to support data requests, query submissions, and data returns.
Assist in managing communications, timelines, and deliverables.
Assist in managing data access workflows, ensuring appropriate approvals, training, and compliance with data usage agreements.
Track data usage and maintain documentation.
Work with sequencing, imaging, and proteomics datasets, supporting standardized formatting, validation, and integration readiness.
Contribute to handling emerging multimodal data types and evolving standards.
Perform quality checks, metadata validation, and documentation to ensure datasets are analysis-ready.
Support troubleshooting of data delivery issues and escalate when necessary.
Contribute to early-stage efforts in AI-enabled data curation and harmonization, supporting improved scalability and efficiency in data delivery workflows.
Partner with internal teams (e.g., AIBT, CBM, gRED TM, pRED DTAs) to support data integration and delivery needs across diverse scientific use cases.

Benefits

A discretionary annual bonus may be available based on individual and Company performance.
This position also qualifies for the benefits detailed at the link provided below.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume