Biohub is launching the Virtual Biology Initiative, a $500 million, five-year commitment to build predictive models of the human cell. This initiative will bring together leading institutions to generate multi-modal biological data at unprecedented scale to power the next generation of AI models for biology. Our data science team defines the algorithms and processing approaches that turn raw biological measurements into rich representations models can learn from. This includes designing data formats and representations optimized for AI use cases, building cost-aware processing pipelines, developing scalable QC and validation frameworks, creating agent-augmented curation tools, and building cross-modal entity resolution and semantic infrastructure. We are seeking scientific leaders who understand biological measurement deeply, think creatively about data representations, sampling, and tokenization strategies, and can translate that thinking into data representations that enable novel training architectures. You will work directly with scientists, computational biologists, data engineers, and AI researchers to define model input and biological evaluations. You will operate with broad scope and high autonomy, influencing roadmap decisions across teams while mentoring senior individual contributors. Success means creating and implementing data systems that are adaptive, interpretable, and scientifically grounded, accelerating progress toward robust biological frontier models and advancing human health.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior