Data Scientist 2 - Computer Analytics & Modeling

Pacific Northwest National Laboratory
14h

About The Position

At PNNL, our core capabilities are divided among major departments that we refer to as Directorates within the Lab, focused on a specific area of scientific research or other function, with its own leadership team and dedicated budget. Our Science & Technology directorates include National Security, Earth and Biological Sciences, Physical and Computational Sciences, and Energy and Environment. In addition, we have an Environmental Molecular Sciences Laboratory, a Department of Energy, Office of Science user facility housed on the PNNL campus. The Earth and Biological Sciences Directorate (EBSD) leads critical research in four areas: Atmospheric, Climate & Earth Sciences, Biological Sciences, Environmental Molecular Sciences, and Global Change. Our vision is to develop a predictive understanding of biological and Earth systems in transition. We aim to understand energy and material flows within the integrated Earth system; to understand, predict, and control the response of biosystems to environmental and/or genomic changes; and to Model the Earth system from the subsurface to the atmosphere. The Environmental Molecular Sciences Division is comprised of 18 interdisciplinary research teams focused on deciphering molecular-level interactions driving biological and environmental processes across temporal and spatial scales. Through computational analysis and modeling, these findings contribute to predictive understanding of how systems respond to environmental perturbations thus enabling solutions to the nation’s energy, environmental, and human health challenges. The division also manages the Environmental Molecular Sciences Laboratory, a Department of Energy, Office of Science user facility housed on the PNNL campus that accelerates the research of scientists around the world by providing access to world-class expertise, instrumentation, and computational resources. The Environmental Molecular Sciences Division’s (EMSD’s) Computing, Analytics, and Modeling (CAM) group focuses on advancing the science of the Environmental Molecular Sciences Laboratory (EMSL) user facility and the mission science of its sponsor, the DOE Office of Science's Office of Biological and Environmental Research (BER) mission, by delivering world-class capabilities and developments in computational science, data analytics and transformations, and modeling sciences. The group, which reports to the CAM Group Leader, works with researchers and staff in EMSL’s other two science areas (Environmental Transformations and Interactions, or ETI, and Functional and Systems Biology, FSB) to deliver on EMSL’s three strategic science objectives: DigiPhen (Digital Phenome), MONet (Molecular Observation Network), and MIDAS (Modeling, Integration, and Data Agents for Science). Because data and computing infrastructure systems are critical to the group’s work, CAM also works closely with the group led by EMSL’s Chief Data Officer. The Computing, Analytics, and Modeling (CAM) Group within the Environmental Molecular Sciences Division at PNNL is seeking a motivated Data Scientist 2 to contribute to cutting edge AI solution for computational and modeling research across the BER mission space. The role requires experience in designing and implementing AI-based agents and agentic workflows, along with a solid understanding of key tools such as LangChain, LangGraph, and Model Context Protocol (MCP). Candidates should have a proven ability to leverage AI to accelerate the software lifecycle and improve data exploration and retrieval, as well as experience supporting various stages of the data lifecycle, including data modeling, harmonizing data models, managing distributed or federated data, and organizational data governance. Additional expertise in metabolic modeling techniques such as flux balance analysis and metabolic control analysis, and familiarity with structural biology data, particularly cryo-electron tomography, is highly desirable. Knowledge of causal inference methods and their application to complex biological systems will further strengthen the candidate’s profile.

Requirements

  • BS/BA and 2 years of relevant experience -OR-
  • MS/MA -OR-
  • PhD

Nice To Haves

  • Degree in Computer Science, Electrical and Computer Engineering, Bioinformatics, Statistics, Physics, Mathematics or a related field.
  • Agentic AI & Tools: Design and implementation of single and multi agent AI workflows for scientific automation; proficiency with LangChain, LangGraph, and Model Context Protocol (MCP); experience with LLM reasoning frameworks (e.g., ReAct) and orchestration for data analysis and metabolic engineering.
  • Cryo ET 3D Vision & Intelligent Retrieval: Expertise in 3D computer vision for cryo electron tomography, topologically aware protein classification, sim to real transfer, and reconstruction level signal characterization, paired with development of advanced search and retrieval systems for complex biological datasets and specialized workflows (e.g., post translational modification discovery).
  • AI & Autonomous Science: Interest in foundation models, multi-agent systems, and autonomous science frameworks; experience applying computational and modeling approaches across the BER mission space.
  • Structural & Multiomic Data: Skilled in structural biology data analysis (especially cryo-electron tomography and protein classification workflows) and multiomic approaches for metabolic and circadian regulation.
  • Programming & Frameworks: Proficiency in Python, PyTorch, TensorFlow, and OpenCV; experience with version control (e.g., Git) and collaborative development practices.
  • Data & Model Development: Skilled in preparing data for machine learning, including signal processing and feature extraction for high-dimensional datasets; familiarity with HPC environments and distributed ML training.
  • Data Lifecycle Management & Engineering: Experience developing and maintaining open source scientific software with CI/CD and containerized, reproducible HPC workflows; expertise in data modeling, harmonization, and management of distributed and federated data systems.
  • Domain-Specific AI Applications: Experience creating agentic workflows for scientific discovery and integrating AI into biological data analysis pipelines.
  • Biological Modeling & Analysis: Expertise in metabolic modeling (flux balance analysis, metabolic control analysis) and whole-cell modeling for spatio-temporal energy metabolism in microbial systems.
  • Advanced Causal Methods: Experience with causal inference and advanced causal analysis techniques for biosystems design (e.g., Causal Component Analysis, Y₀-based identification).
  • Problem-Solving & Adaptability: Strong ability to tackle complex scientific and data challenges with attention to detail; adaptable to emerging technologies and innovative AI-driven approaches.
  • Collaboration & Communication: Effective in interdisciplinary environments, with proven skills in written and verbal communication across computational, biological, and data science domains.
  • Leadership & Initiative: Demonstrated ability to identify opportunities, advocate for them, and integrate agentic AI with deterministic workflows (e.g., ADEPT-Bio framework).
  • Remote & Distributed Work: Proven success working in highly distributed teams and fostering collaboration in virtual settings.
  • Intellectual Curiosity: Enthusiasm for interdisciplinary research and continuous learning in advanced computational and biological sciences.

Responsibilities

  • Designs, develops, documents, tests, and debugs new and existing software systems, hardware/software interfaces, and/or applications according to industry established software engineering principals and best practices.
  • Works collaboratively within a team to execute on the full system development lifecycle including analyzing user needs to determine technical requirements; developing technical specifications based on conceptual design and requirements; developing well-crafted and documented source code; integrating hardware using software; automating manual tasks; and consulting with the end user to prototype, configure, refine, test, and debug programs or systems to meet needs.
  • Identifies and evaluates new technologies or methods for implementation and continuous improvement.
  • Drive the design and implementation of agentic workflows for scientific automation, leveraging frameworks such as LangChain, LangGraph, and the Model Context Protocol (MCP).
  • Develop, maintain, and support open‑source scientific software, using CI/CD practices and containerized, reproducible workflows on HPC systems.
  • Advance CryoET data analysis capabilities through the integration of AI methods, physics‑based simulations, and structural biology toolkits.
  • Expand systems biology modeling capabilities, including metabolic modeling, whole‑cell modeling, and causal‑reasoning approaches.
  • Communicate technical findings, including contributing to and leading the preparation of scientific output including reports, manuscripts, visualizations, stakeholder presentations, and software.

Benefits

  • Employees and their families are offered medical insurance, dental insurance, vision insurance, robust telehealth care options, several mental health benefits, free wellness coaching, health savings account, flexible spending accounts, basic life insurance, disability insurance, employee assistance program, business travel insurance, tuition assistance, relocation, backup childcare, legal benefits, supplemental parental bonding leave, surrogacy and adoption assistance, and fertility support.
  • Employees are automatically enrolled in our company-funded pension plan and may enroll in our 401 (k) savings plan with company match.
  • Employees may accrue up to 120 vacation hours per year and may receive ten paid holidays per year.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service