Biohub is leading the new era of AI-powered biology to cure or prevent disease through its 501c3 medical research organization, with the support of the Chan Zuckerberg Initiative. The Team Biohub supports the science and technology that will make it possible to help scientists cure, prevent, or manage all diseases by the end of this century. While this may seem like an audacious goal, in the last 100 years, biomedical science has made tremendous strides in understanding biological systems, advancing human health, and treating disease. Achieving our mission will only be possible if scientists are able to better understand human biology. To that end, we have identified four grand challenges that will unlock the mysteries of the cell and how cells interact within systems — paving the way for new discoveries that will change medicine in the decades that follow: Building an AI-based virtual cell model to predict and understand cellular behavior Developing novel imaging technologies to map, measure and model complex biological systems Creating new tools for sensing and directly measuring inflammation within tissues in real time.tissues to better understand inflammation, a key driver of many diseases Harnessing the immune system for early detection, prevention, and treatment of disease The Opportunity The Data Pipelines team processes scientific datasets specifically designed to enable biological modeling and supporting AI research. It is responsible for data ETL, data validation, testing, storage, and partners with the data management team for retrieval. We handle over 89 million unique cells worth of single cell transcriptomic data, over 15 thousand cryoET tomograms that are in imaging datasets as large as 20TB and counting, and will be expanding to support larger scale and additional imaging, sequencing, and literature modalities. Our resources provide access to open source data that is structured and used by tens of thousands of scientists each month to quickly query and form hypotheses on understanding how genetic variants in cells impact disease risk, define drug toxicities, and eventually discover better therapies. As a software engineer on the Data Engineering team, you will contribute for architecture, help implement all the above mentioned data needs for our platforms, CELLxGENE Discover, CryoET, as well as the new platform we are building that has a focus on data for AI and the virtual cell, in order to enable scientists to further interrogate our very large and growing corpus of data without any need to download the data itself or have any computational expertise. You will work on a collaborative, multidisciplinary team to develop solutions for our scientist users to accelerate their workflows and accelerate the pace of scientific discovery. No prior biology experience is needed for this role. You will have the opportunity to pair with Computational Biologists to develop solutions for our users and be able to learn about biology from experts on our team. Our tech stack: Python, Terraform, AWS infrastructure, Argo CD and Workflows. TileDB.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level