General DiffUSE Job Application

Astera Institute•Emeryville, CA

1d•$100,000 - $300,000•Onsite

About The Position

DiffUSE is a project at the Astera Institute focused on developing open infrastructure for studying protein dynamics directly from experimental data. The project operates at the intersection of structural biology (crystallography, cryo-EM), modern machine learning, computational biophysics, and open scientific tooling. The team comprises computational biologists, ML researchers, software engineers, and program staff, collaborating across Astera, Radial, and partner institutions. DiffUSE is expanding and is seeking individuals excited about contributing to their mission, even if a specific role isn't currently posted. Submissions are reviewed on a rolling basis, and interested candidates will be contacted if a suitable opportunity arises now or in the future. The project is actively building around several key areas: Computational and data science, including diffraction data processing, structural data pipelines, multiconformer and heterogeneity analysis, data standards (mmCIF), macromolecular ensemble metrics, and machine learning research for macromolecules and biophysics (representation learning, ML on raw experimental data, 3D vision, geometric deep learning). They are also focused on Dataset generation and open release, which involves designing and running campaigns for large structural datasets, partnering with external collaborators for data standardization, and bridging experimental facilities, data producers, and the open-science community. Lastly, they are building out Software and infrastructure engineering capabilities, focusing on scientific data infrastructure, pipelines, tooling, and open-source release engineering, reproducibility, and developer experience, as well as Program and operations roles in program management, scientific coordination, communications, and open-science publishing and community building.

Requirements

Familiarity with protein biophysics and the experimental methods that generate structural data
High agency: identify what needs doing and move it forward without waiting for direction
Ability to drive projects and people, including collaborators outside your reporting line
Comfortable owning a problem end-to-end and unblocking collaborators
Strong commitment to open science and public-good infrastructure
Comfort working at the boundary between disciplines
Bias toward shipping, iteration, and rapid feedback
Clear written and verbal communication
Track record of independent work inside collaborative teams
Experience in computational and data science
Experience in diffraction data processing and structural data pipelines
Experience in multiconformer and heterogeneity analysis from large datasets
Experience with data standards work (mmCIF and related)
Experience with macromolecular ensemble metrics
Experience in machine learning research for macromolecules and biophysics
Experience in representation learning for protein dynamics
Experience with ML on raw experimental data rather than processed structures
Experience in 3D vision and geometric deep learning
Experience in designing and running campaigns to generate large structural datasets
Experience in partnering with external collaborators to open and standardize existing datasets
Experience in bridging experimental facilities, data producers, and the open-science community
Experience in scientific data infrastructure, pipelines, and tooling
Experience in open-source release engineering, reproducibility, and developer experience
Experience in program management, scientific coordination, or communications
Experience in open-science publishing and community building

Nice To Haves

Backgrounds in 3D vision and geometric deep learning are especially welcome

Responsibilities

Identify what needs doing and move it forward without waiting for direction
Drive projects and people, including collaborators outside your reporting line
Own a problem end-to-end and unblock collaborators
Ship, iterate, and gather rapid feedback
Contribute to open science and public-good infrastructure
Work at the boundary between disciplines
Generate large structural datasets
Partner with external collaborators to open and standardize existing datasets
Bridge experimental facilities, data producers, and the open-science community
Develop scientific data infrastructure, pipelines, and tooling
Engage in open-source release engineering, reproducibility, and developer experience
Manage programs and coordinate scientific efforts
Handle communications related to the project
Engage in open-science publishing and community building