Postdoctoral Associate | Vertebrate Genome Laboratory

Rockefeller UniversityNew York, NY

About The Position

The Vertebrate Genome Laboratory (VGL) at The Rockefeller University leads international efforts in vertebrate genome sequencing, assembly, annotation, and evolutionary analysis. As a core laboratory of the Vertebrate Genomes Project and a major hub of the Earth BioGenome Project, VGL is at the forefront of generating high-quality, telomere-to-telomere genome assemblies across vertebrate diversity.The laboratory provides an end-to-end genomics platform spanning sample processing, sequencing, genome assembly, and downstream data analytics. It specializes in high-molecular weight DNA extraction and long-read genomic technologies, offering integrated library preparation and sequencing services for high–molecular weight genomic DNA, long amplicons, and full-length transcriptome sequencing (Iso-Seq). These capabilities are tightly coupled with in-house computational expertise for genome assembly, curation, annotation, and comparative genomics, and are supported by state-of-the-art platforms including PacBio and ONT. Overview We seek a Postdoctoral Associate with AI engineering skills to work as part of our new GAIA (Genomic Artificial Intelligence Applications) collaboration, between The Rockefeller University, Revive & Restore, Cornell University, and partners including Google AI Genomics, funded by the Bezos Earth Fund and Google.org. The project aims to develop next-generation AI systems that scale genome assembly and enable genome-driven conservation. Two central goals of GAIA are: to build an AI Genome Curation Assistant (Jarvis): an AI system that combines modern machine learning approaches with large-scale biological data to automate genome curation by detecting, interpreting, and correcting structural errors, reducing manual effort from weeks to minutes thus that allowing to scale up to thousands of species per year; To build Genera, a multimodal, agentic AI system that integrates genomic and ecological data to assess extinction risk and generate actionable genetic rescue strategies for conservation practitioners. This role blends research and engineering, with an emphasis on building deployable AI systems while contributing to scientific publications. The successful Postdoc will focus on the development of Jarvis and will also have the opportunity to contribute to the development of Genera. The core technical challenge is to translate expert-driven genome curation workflows into learnable AI systems. This involves combining: Pattern recognition DNA sequence-level reasoning Iterative decision-making across multi-step pipelines The successful candidate will design and implement end-to-end AI pipelines integrating: Transformer-based models for DNA sequence reasoning Computer vision approaches for genomic data representations Graph-aware or structured prediction methods for genome assembly You will work across modeling, data, and systems integration to build a unified platform for AI-assisted genome assembly and curation, in close collaboration with VGL bioinformatics teams, genome curators, and the Rockefeller Data Science Platform, a resource center with AI developers. The role includes access to substantial compute resources and collaboration with leading AI partners.

Requirements

  • Ph.D. (or equivalent) in machine learning, computer science, computational biology, or a related field
  • Strong experience in machine learning and deep learning, including transformer-based or related architectures
  • Proven ability to build end-to-end AI systems, from modeling to integration
  • Proficiency in Python and ML frameworks (e.g., PyTorch, JAX, TensorFlow)
  • Experience with HPC or cloud environments and distributed training
  • Strong software engineering skills (modular design, testing, version control)
  • Ability to work effectively in interdisciplinary teams

Nice To Haves

  • Experience building multimodal models combining sequence, image, and structured data
  • Experience with computer vision or analysis of structured visual data
  • Familiarity with graph-based models or structured prediction methods
  • Experience with foundation models or large-scale pretraining
  • Familiarity with genomic data formats and pipelines (FASTA, BAM, Hi-C, assembly graphs)
  • Experience with MLOps/DevOps (Docker, Kubernetes, CI/CD)

Responsibilities

  • Design and implement AI models for genome assembly tasks, including:
  • Chromosome assignment and ordering
  • Detection of structural errors from Hi-C/contact maps
  • Suggestion of corrections using sequence-aware models
  • Develop multi-stage inference and decision pipelines integrating detection, reasoning, and correction
  • Build and optimize models combining multiple data modalities (sequence, Hi-C, annotations, assembly graphs)
  • Engineer training datasets and features from large curated genome collections (~3,000 species)
  • Develop scalable, production-quality pipelines and contribute to system architecture
  • Collaborate with domain experts to formalize and automate genome curation strategies
  • Contribute to scientific publications, software releases, and collaborative research
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service