Senior Machine Learning Engineer, AI Agents for Science, AI for Drug Discovery

Roche•New York, NY

21h•$160,100 - $310,800

About The Position

A healthier future. It’s what drives us to innovate. To continuously advance science and ensure everyone has access to the healthcare they need today and for generations to come. Creating a world where we all have more time with the people we love. That’s what makes us Roche. Advances in AI, data, and computational sciences are transforming drug discovery and development. Roche’s Research and Early Development organisations at Genentech (gRED) and Pharma (pRED) have demonstrated how these technologies accelerate R&D, leveraging data and novel computational models to drive impact. Seamless data sharing and access to models across gRED and pRED are essential to maximising these opportunities. The new Computational Sciences Center of Excellence (CoE) is a strategic, unified group whose goal is to harness the transformative power of data and Artificial Intelligence (AI) to assist our scientists in both pRED and gRED to deliver more innovative and transformative medicines for patients worldwide. The Opportunity At Roche's AI for Drug Discovery (AIDD) group, we are revolutionizing drug discovery with cutting-edge machine learning (ML) techniques. Our Foundation Models team builds large language models (LLMs) and agent platforms that enable next-generation scientific and biomedical applications across the drug-discovery pipeline. We are seeking exceptional scientists to join our efforts on one of the largest scientific agent and automation platforms in the industry. Our team focuses on developing and operating the foundational platform that transforms scientific knowledge and actions from thousands of world-class scientists into sharable, reusable tools, workflows, and agents, reshaping how drug discovery operates with large-scale in-house scientific use cases. We are seeking a Senior Machine Learning Engineer to join the Foundation Model team to build the platform for autonomous scientific agents to automate and accelerate drug discovery. You will partner with Machine Learning Scientists to engineer the distributed systems that allow models to plan workflows, interact with scientific software, and execute complex tasks. You will lead the design and implementation of core infrastructure components that bridge the gap between model inference and experimental data generation. You will work with drug discovery scientists to deploy the system to real drug discovery processes.

Requirements

BS/MS with 4–7+ years of experience, or PhD with 0–2+ years of relevant industry experience.
Agent Systems: Deep experience building and deploying complex LLM-based applications, with a focus on state management, tool execution, and reliable structured outputs.
Backend Engineering: Expert proficiency in Python and asynchronous programming (FastAPI, asyncio) with a strong background in distributed systems.
Infrastructure: Experience deploying ML systems in containerized environments and managing integrations with vector databases.
Technical Leadership: Proven ability to lead the technical delivery of complex components and mentor junior engineers on software best practices.
Mission Driven: You are motivated by the goal of improving human health and want your code to directly contribute to the discovery of new medicines.

Nice To Haves

Experience working with scientific data structures (e.g., molecular graphs, protein sequences) or cheminformatics tools is a plus, but not required.
Familiarity with the scientific software ecosystem (e.g., RDKit, Biopython).

Responsibilities

Design and build the distributed backend infrastructure for multi-agent systems, managing state, orchestration, and execution across our compute clusters.
Implement and standardize tool interfaces using the Model Context Protocol (MCP) to expose internal scientific packages (chemistry, biology, and informatics tools) as executable actions for models.
Engineer robust APIs and event-driven architectures to integrate agent workflows with experimental data pipelines and execution environments.
Deploy and scale agentic systems in production using modern cloud-native patterns, ensuring high availability and low-latency access for internal research teams.
Optimize system performance, including efficient context management (RAG), caching, and parallel execution of scientific tasks.
Drive engineering excellence by defining software standards, leading code reviews, and building reusable Python libraries for the broader team.
Collaborate closely with computational scientists and subject matter experts on designing and evaluating targeted agents for drug discovery.
Explore frontier research topics related to agentic use in scientific scenarios and publish the observations.
Design and perform training and evaluation of the backbone Large Language Models (LLMs) for improved scientific agentic performance

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume