Senior/Staff Data Engineer (Scientific Data Engineer)

Merge Labs•San Francisco, CA

About The Position

Merge Labs is a frontier research lab with the mission of bridging biological and artificial intelligence to maximize human ability, agency and experience. We’re pursuing this goal by developing fundamentally new approaches to brain-computer interfaces that interact with the brain at high bandwidth, integrate with advanced AI, and are ultimately safe and accessible for anyone to use. About the team: Sufficiently advanced BCIs can restore lost abilities, support healthier brain states, deepen our connection with each other and expand what we can imagine and create alongside advanced AI. Our team is responsible for turning this vision into algorithms. Working across synthetic biology, neuroscience, device physics, signal processing, and machine learning, we design more effective ways to bridge human and artificial intelligence. We design experiments and analytical frameworks, collect data, train models, and optimize performance to build Brain-AI systems that can scale to many people and many uses. We move with urgency, balancing creative exploration with engineering rigor, because expanding human ability, agency, and experience is one of the most important challenges of our time. About the role: As the senior-most data engineer on the team, you’ll define and own the pipelines that capture, process, and serve the data driving Merge’s molecular optimization platform. You’ll translate heterogeneous laboratory outputs into well-structured, queryable, schema-driven datasets that power scientific analysis and closed-loop ML. You’ll work directly with experimentalists to establish data standards and metadata conventions, and with ML engineers to make results available in production-grade systems. This role reports to the Head of Software and is highly cross-functional—spanning software engineering, data architecture, and scientific informatics. As part of the Core Software team, you will be directly supported by infrastructure specialists, and you will work directly with the Application Development Lead to ensure that necessary scientific and user inputs are captured.

Requirements

5–10+ years of experience building and operating data pipelines or backend systems in production.
Strong software fundamentals in Python, SQL, and data modeling; familiarity with C++, low-latency data pipelines and on-premises deployments preferred.
Experience designing schemas and metadata frameworks for complex, evolving datasets.
Proven ability to partner with non-technical users to understand needs and ship usable systems.
Comfort owning systems end-to-end—from design and implementation to deployment and monitoring.

Nice To Haves

Background in computational biology, bioinformatics, or scientific data systems.

Responsibilities

Build and operate ingestion pipelines from laboratory instruments into centralized storage.
Design schemas and metadata capture standards for experimental data.
Implement post-processing pipelines that produce analysis-ready datasets for scientists.
Establish monitoring, alerting, and structured logging for both pipeline and data quality.
Partner with biologists to map experimental workflows to data models.
Build interfaces (APIs, dashboards, and LLM-enabled tools) that make data easily accessible.
Drive continuous improvement of data infrastructure as new protocols and data types emerge.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume