(Senior) Data Engineer

Flagship Pioneering•Boston, MA

5d•$74,000 - $176,000

About The Position

About ProFound Therapeutics ProFound Therapeutics is pioneering the discovery of the expanded human proteome to unlock a new universe of potential therapeutics. By integrating multi-omics, advanced computation, and translational biology, we aim to reveal and characterize thousands of previously uncharted proteins and systematically explore their role in health and disease. The Role We're seeking a (Senior) Data Engineer to join our data team. This individual will play a central role in building the data foundation that powers ProFound's drug discovery platform. You'll architect systems that integrate diverse biological data—from genomics and proteomics to imaging and perturbation experiments—enabling our scientists to make breakthrough discoveries and our ML models to identify novel therapeutic targets. This role offers the unique challenge of working at the intersection of computational biology, machine learning, and modern data engineering, with the impact of accelerating life-saving therapeutics.

Requirements

BS, MS, or PhD in Computer Science, Bioinformatics, or related field with 0-4 years of professional data engineering experience.
Background in scientific domains (biology, chemistry, or related fields).
Python expertise including data science libraries and testing frameworks.
AWS experience with storage, database, compute, and analytics services (S3, RDS, DynamoDB, Redshift, Lambda, EC2, Batch, ECS, Glue, Athena).
Proven experience designing, deploying, and maintaining production data pipelines at scale.
Hands-on experience with workflow orchestration systems (AWS Step Functions, NextFlow, dbt, Dagster) and event-driven architectures.
Working knowledge of CI/CD frameworks, infrastructure-as-code (CloudFormation or AWS CDK), and containerization (Docker).
Strong technical communication skills with ability to translate complex technical concepts for scientific audiences and collaborate effectively across disciplines.
Demonstrated ability to thrive in dynamic environments, prioritize competing demands, and make pragmatic trade-offs in a fast-paced startup setting.

Nice To Haves

Experience with data lakes and open table formats (Iceberg preferred).
Experience with knowledge graph technologies and graph databases (Neo4j).
Familiarity with lab data management systems (LIMS, ELN, integrated data lakes).
Experience with MLOps practices and tools for model training pipelines, experiment tracking, and model deployment.
AWS certification (Associate or Professional level).

Responsibilities

Contribute to design and scaling of our multi-modal data platform that integrates public and proprietary biological data (genomics, transcriptomics, proteomics, imaging, perturbation data) across data lakes, graph databases, relational and NoSQL databases, and data warehouses, enabling ML training, computational biology pipelines, and scientific exploration.
Build production data pipelines and workflows that automate data ingestion and transformation, working with domain experts to optimize analysis pipelines for scientific discovery.
Partner with computational and wet-lab scientists to model experimental data, manage instrument outputs and electronic lab notebook data, and ensure seamless integration into our data platform.
Develop and manage cloud infrastructure on AWS following best practices and the Well-Architected framework, with focus on scalability, security, and cost optimization.
Contribute to the data engineering team’s best practices including comprehensive documentation, monitoring and observability, and robust testing frameworks.
Collaborate with external partners including CROs, vendors, and consultants to coordinate data transfers and support platform integrations.