Forward-Deployed Cheminformatician

Apheris

13d•Remote

About The Position

At Apheris, we are building the future of how AI is applied in pharmaceutical R&D. We enable leading pharmaceutical teams to discover and develop drugs faster. We host the industry’s largest federated data networks for drug discovery AI, spanning co-folding, ADMET, and antibody developability. Across these networks, models are trained on proprietary industry datasets to achieve higher performance and broader applicability while keeping data control and IP protected. We deliver these superior models through drug discovery applications that enable teams to run them at scale, further customize them, and integrate them into existing R&D workflows. AI Structural Biology (AISB) Network: Pharmaceutical companies collaborate in the field of co-folding, structure-based binding affinity predictions and antibody design. ADMET Network: Pharmaceutical and biotech companies collaborate to improve small-molecule property prediction and expand in to further drug modalities. Antibody Developability Network: Pharma partners collaborate to federate historical and purpose-built antibody developability datasets for secure ML training, without data leaving each partner’s environment. About the role We are looking for a Forward-Deployed Cheminformatician to own how binding data is prepared across our co-folding focused networks and initiatives. Binding data is the input that decides whether our co-folding and binding-affinity models perform in real drug programs. It arrives from pharma partners in heterogeneous shapes — different assay registries, different metadata, different chemical-representation standards, different choices on qualifiers, replicates and censoring. We need someone who turns this into a repeatable, well-documented preparation pipeline that pharma representatives can run alongside us, and that scales to the public-data corpus we build for our own model training. This is half engineering, half forward-deployed work. You will define the protocol, harden it with validators and scripts, integrate it into the Apheris products, run it with each new partner, and own the equivalent pipeline for the public binding-data corpus.

Requirements

BSc, MSc, PhD or equivalent in cheminformatics, computational chemistry, or a related field, plus 3+ years preparing biological assay data in a discovery setting.
Fluent in Python and RDKit.
SMILES normalization, tautomer / ionization / stereochemistry handling, and scaffold extraction are second nature, and you understand why each matters for activity cliffs and model training.
Hands-on experience curating quantitative binding assay data (KD, Ki, IC50, pIC50) and HTS data — censored values, qualifiers, duplicates, replicate aggregation, and assay metadata interpretation.
Write good engineering code — version control, tested modular scripts, validators that return useful errors.
Comfortable forward-deployed with pharma medicinal chemists and biologists.
Can sit in a sense-check meeting, pull out what is actually meant by a column label, and encode that back into the protocol.
Enjoy turning a messy ad-hoc cleaning job into a repeatable protocol others can run.

Nice To Haves

Practical familiarity with public binding-data sources ( ChEMBL , BindingDB , PubChem BioAssay ) and the gotchas in each.
Applied LLM tooling (Claude, Codex, Cursor) to accelerate data cleaning or metadata harmonization.
Worked across institutional data boundaries — federated, multi-party, or otherwise — where the data-preparation contract has to hold under partial visibility.
A publication record or open-source contributions in cheminformatics or quantitative pharmacology.

Responsibilities

Define and own the binding-data preparation protocol — data schema, small-molecule standardization, assay metadata model, value handling (KD, Ki, IC50, pIC50), qualifier and censored-value handling, duplicate and replicate aggregation.
Build the tooling that runs it — modular scripts, validators with actionable errors, and reusable pipelines that survive different pharma upstream systems ( Dotmatics , Spotfire, in-house registries).
Work forward-deployed with pharma. Sit with their biologists and medicinal chemists, walk them through the protocol, sense-check what an assay column actually measures , and unblock retrieval.
Maintain the small-molecule representation pipeline — RDKit standardization, tautomer and ionization handling, stereochemistry preservation, and PAINS / frequent-hitter filtering.
Curate the public binding-data foundation — ChEMBL , BindingDB , PubChem BioAssay — prepared to the same standard, so our models train on the strongest public baseline anyone can assemble.
Hand the productized pipeline cleanly to engineering for scaling, and partner with ML to keep the data contract valid as models and networks evolve.

Benefits

Industry-competitive compensation, including early-stage virtual share options
Remote-first work
Wellbeing budget, mental health support, work-from-home budget, co-working stipend, and learning budget
Generous holiday allowance
Office Days at our Berlin HQ or a different European location (3x per year)

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume