About The Position

Biohub is launching the Virtual Biology Initiative, a $500 million, five-year commitment to build predictive models of the human cell. This initiative will bring together leading institutions to generate multi-modal biological data at unprecedented scale to power the next generation of AI models for biology. Our data science team defines the algorithms and processing approaches that turn raw biological measurements into rich representations models can learn from. This includes designing data formats and representations optimized for AI use cases, building cost-aware processing pipelines, developing scalable QC and validation frameworks, creating agent-augmented curation tools, and building cross-modal entity resolution and semantic infrastructure. We are seeking scientific leaders who understand biological measurement deeply, think creatively about data representations, sampling, and tokenization strategies, and can translate that thinking into data representations that enable novel training architectures. You will work directly with scientists, computational biologists, data engineers, and AI researchers to define model input and biological evaluations. You will operate with broad scope and high autonomy, influencing roadmap decisions across teams while mentoring senior individual contributors. Success means creating and implementing data systems that are adaptive, interpretable, and scientifically grounded, accelerating progress toward robust biological frontier models and advancing human health.

Requirements

  • 12+ years of experience (or PhD + 7 years) working with large-scale biological datasets, including ownership of end-to-end data products
  • Deep expertise in at least one of: (a) imaging data—microscopy, cell phenotyping, spatial biology, and the data characteristics of image-based biological measurement; or (b) genomics data—bulk and single-cell sequencing, functional genomics, epigenomics, transcriptomics, spatial biology, and/or multi-omics
  • Understanding of how to transform raw biological data into AI-ready datasets, including familiarity with scientific best practices, noise characteristics, batch effects, and quality assessment specific to your domain
  • Experience with tokenization strategies for non-text data (images, sequences, graphs, time series) or with creating data representations and feature engineering for machine learning in scientific or biological contexts
  • Strong expertise in data science and statistical modeling; familiarity with modern ML architectures (transformers, diffusion models, or similar) and how data representation choices affect learning
  • Strong computational skills; demonstrated ability to design robust, extensible data architectures
  • Excellent communication and leadership skills, with the ability to translate between biology, ML, and engineering audiences and align teams to deliver complex projects
  • Creative, first-principles thinking about how to structure data for learning

Responsibilities

  • Set technical vision and strategy for the design of data representations and tokenization strategies across biological data types—including imaging, sequencing, and multimodal data—that enable novel model architectures
  • Develop, deploy and validate approaches for combining heterogeneous data modalities into unified training frameworks, designing for robustness to noise, bias, and batch effects
  • Evaluate model performance, identifying which biological signals are captured or lost and iterating to improve
  • Partner deeply with ML engineers and AI researchers to co-design datasets and optimize model training, evaluation, and generalization
  • Lead cross-functional initiatives spanning data engineering, infrastructure, science, and product, aligning technical execution with long-term scientific goals
  • Identify and drive new data acquisition and generation opportunities, from consortium partnerships to internal experimental pipelines
  • Serve as a technical mentor and leader, raising the bar for data science and ML rigor across the organization

Benefits

  • Generous employer match on employee 401(k) contributions
  • Paid time off to volunteer at an organization of your choice
  • Funding for select family-forming benefits
  • Relocation support for employees who need assistance moving
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service