Machine Learning Scientist, Scientific Reasoning Models, AI for Drug Discovery

Genentech•New York, NY

About The Position

A healthier future. It’s what drives us to innovate. To continuously advance science and ensure everyone has access to the healthcare they need today and for generations to come. Creating a world where we all have more time with the people we love. That’s what makes us Roche. Advances in AI, data, and computational sciences are transforming drug discovery and development. Roche’s Research and Early Development organisations at Genentech (gRED) and Pharma (pRED) have demonstrated how these technologies accelerate R&D, leveraging data and novel computational models to drive impact. Seamless data sharing and access to models across gRED and pRED are essential to maximising these opportunities. The new Computational Sciences Center of Excellence (CoE) is a strategic, unified group whose goal is to harness the transformative power of data and Artificial Intelligence (AI) to assist our scientists in both pRED and gRED to deliver more innovative and transformative medicines for patients worldwide. At Roche's AI for Drug Discovery (AIDD) group, we are revolutionizing drug discovery with cutting-edge machine learning (ML) techniques. We are seeking a Machine Learning Scientist to join the Foundation Models team within Prescient Design (gRED). In this role, you will contribute to our internal reasoning Large Language Models (LLMs) and enable it to succeed at relevant drug discovery tasks, including biomolecular design. You will work at the intersection of engineering and research, designing and scaling large machine learning systems. In this role, you will: Scalable Systems & Engineering: Design, implement, and improve large-scale distributed machine learning systems, writing robust, performance-critical code and contributing to core infrastructure. Model Improvement & Reasoning: Develop and execute strategies to systematically improve performance on scientific tasks, including long-horizon task completion and complex reasoning challenges. Domain Translation: Translate biological and chemical domain knowledge into concrete machine learning objectives, training signals, and evaluation criteria. Evaluation & Benchmarks: Design and implement evaluation methodologies to assess model capabilities relevant to biological research, working with domain experts to establish benchmarks and curate high-quality data. Research-to-Production: Collaborate closely with researchers to translate ideas and prototypes into scalable, production-ready systems. As a Machine Learning Scientist: Focus: You focus on the execution of defined projects. You are responsible for writing clean, efficient code to test specific hypotheses regarding reasoning and alignment. Engineering: You contribute to the maintenance of the training infrastructure and data pipelines, ensuring experiments run reliably on our clusters. Collaboration: You work closely with senior scientists to implement novel algorithms, translating research papers into working prototypes.

Requirements

BS/MS in Computer Science, Statistics, Mathematics, Physics, or a related quantitative field with 2+ years of relevant work experience. Or Ph. D. with 0-2 years relevant work experience.
LLM Expertise: Experience developing and training large-scale machine learning models, including post-training techniques to enhance domain knowledge, reasoning capabilities, and model alignment.
Publication Record: A strong history of research excellence at top-tier venues (e.g., NeurIPS, ICLR, ICML).
Engineering: Strong software engineering skills and experience working with high-performance computing systems.

Nice To Haves

Experience with molecular modalities (e.g., protein sequences, chemical graphs, and structured molecular data).
A public portfolio of research or significant contributions to open-source ML libraries.
A passion for applying frontier AI to drug discovery.

Responsibilities

Design, implement, and improve large-scale distributed machine learning systems, writing robust, performance-critical code and contributing to core infrastructure.
Develop and execute strategies to systematically improve performance on scientific tasks, including long-horizon task completion and complex reasoning challenges.
Translate biological and chemical domain knowledge into concrete machine learning objectives, training signals, and evaluation criteria.
Design and implement evaluation methodologies to assess model capabilities relevant to biological research, working with domain experts to establish benchmarks and curate high-quality data.
Collaborate closely with researchers to translate ideas and prototypes into scalable, production-ready systems.
Focus on the execution of defined projects.
Responsible for writing clean, efficient code to test specific hypotheses regarding reasoning and alignment.
Contribute to the maintenance of the training infrastructure and data pipelines, ensuring experiments run reliably on our clusters.
Work closely with senior scientists to implement novel algorithms, translating research papers into working prototypes.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume