Senior Machine Learning Engineer - BRAID Genomics

Roche•South San Francisco, CA

About The Position

A healthier future. It’s what drives us to innovate. To continuously advance science and ensure everyone has access to the healthcare they need today and for generations to come. Creating a world where we all have more time with the people we love. That’s what makes us Roche. Advances in AI, data, and computational sciences are transforming drug discovery and development. Roche’s Research and Early Development organisations at Genentech (gRED) and Pharma (pRED) have demonstrated how these technologies accelerate R&D, leveraging data and novel computational models to drive impact. Seamless data sharing and access to models across gRED and pRED are essential to maximising these opportunities. The new Computational Sciences Center of Excellence (CoE) is a strategic, unified group whose goal is to harness the transformative power of data and Artificial Intelligence (AI) to assist our scientists in both pRED and gRED to deliver more innovative and transformative medicines for patients worldwide. The Opportunity Genentech is seeking an exceptional Senior Machine Learning Engineer to join the BRAID (Biology Research | AI Development) team within our Computational Sciences organization. This role will focus on developing, optimizing and deploying novel machine learning methods with a strong emphasis on foundation models for genomics data modalities. You will build and productionize machine learning systems and foundation models for regulatory genomics, enabling sequence-to-function modeling, variant effect predictions, and nucleic acid design.

Requirements

PhD with 5+ years professional software engineering experience (or equivalent), including ownership of production services or ML platforms.
Strong Python engineering skills (clean architecture, packaging, testing, CI, performance profiling).
Deep experience with PyTorch, including training workflows and debugging numerical/performance issues.
Experience with modern deep learning for sequences or high-dimensional biological data (transformers, representation learning, generative modeling).
Familiarity with regulatory genomics or functional genomics data and evaluation (e.g., expression/splicing/chromatin assays; variant effect prediction).
Excellent communication skills; ability to translate scientific needs into reliable software and measurable deliverables.

Nice To Haves

Experience with long-context sequence modeling and established genomic frameworks (e.g., Enformer/Borzoi-style models).
Distributed training and systems experience (multi-GPU, DDP/FSDP, mixed precision; GPU profiling/optimization).
MLOps experience (model registry, experiment tracking, deployment pipelines, monitoring/drift).
Experience with single-cell and scverse ecosystem tools (scanpy/anndata, etc.).
Publications or open-source contributions in ML + genomics are a plus.

Responsibilities

Training, fine-tuning, evaluation, benchmarking, deployment of DNA/RNA sequence-to-function models.
Design scalable data and training pipelines (distributed training, efficient dataloading, reproducibility, experiment tracking).
Build and maintain production-grade inference systems (APIs/SDKs, latency/cost optimization, reliability, monitoring).
Establish engineering best practices for ML codebases: CI, unit/integration tests, model versioning, documentation, and code reviews.
Collaborate closely with computational biologists, wet-lab partners, and platform teams to define requirements, success metrics, and adoption pathways.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume