Machine Learning Staff Scientist at NSF-NCEMS

Penn State UniversityBenner Township, PA
Hybrid

About The Position

The U.S. National Science Foundation National Synthesis Center for Emergence in the Molecular and Cellular Sciences (NCEMS) and the Institute for Computational and Data Sciences (ICDS) at Penn State seeks an outstanding scientist to fill a Machine Learning Staff Scientist (Research Data Scientist - Intermediate Professional or Advanced Professional level) position dedicated to advancing the collaborative research of the Center’s Working Groups. NCEMS is an interdisciplinary research Center positioned at the interface of data science with molecular and cellular biology. The Center provides leadership in the integration of diverse, publicly available datasets, enabling cross-disciplinary teams of scientists to synthesize knowledge and pursue fundamental questions at the forefront of the life sciences. Machine Learning Staff Scientists play a supporting role in enabling the research efforts of multidisciplinary scientific teams supported by NCEMS, typically contributing to 2-3 projects simultaneously.

Requirements

  • M.S. or PhD in Machine Learning, Computational Biology, Bioinformatics, Computer Science, Statistics, Data Science, or a related field is preferred.
  • Strong proficiency in Python for scientific computing and machine learning, including experience with common ML libraries/frameworks (e.g., PyTorch, TensorFlow, JAX, scikit-learn).
  • Demonstrated experience and understanding of core machine learning, deep learning and statistical methods such as: regression and generalized linear models; classification and clustering; dimensionality reduction; sequence and time-series modeling; deep learning architectures including CNNs, RNNs, GNNs, and transformers; generative modeling (e.g., diffusion and variational/auto-regressive approaches), representation learning and self-/weakly-supervised learning, natural language processing, computer vision, and causal inference.
  • Experience working with high-dimensional, large-scale molecular and cellular datasets (e.g., genomic, transcriptomic, epigenomic, proteomic, metabolomic/lipidomic, imaging-derived, single-cell, or multi-omics data), including appropriate preprocessing and normalization strategies for ML.
  • Solid understanding of molecular and cellular biology concepts sufficient to frame ML problems across the central dogma (sequence, expression, regulation, and protein function/structure) and to collaborate effectively with domain scientists.
  • Experience with software engineering practices for research-grade code, version control (Git), reproducible environments (containers/conda), HPC/GPU computing.
  • Publications in peer-reviewed journals demonstrating contributions to the field.
  • Experience supporting/contributing to multi-PI projects.
  • Commitment to ethical conduct and research integrity.
  • Strong work ethic.
  • Strong interpersonal and written communication skills.
  • Ability to work well in a team environment.
  • Applicants must be authorized to work in the U.S.

Responsibilities

  • Collaborate with NCEMS Working Groups to design, develop, and evaluate machine learning approaches for integrating, analyzing, and visualizing molecular and cellular biology data across the central dogma and regulatory processes.
  • Prepare ML-ready datasets by leading data wrangling, harmonization, standardization, quality control, and documentation to support robust training and reuse across biological modalities.
  • Develop end-to-end ML workflows (feature/representation learning, training, validation, benchmarking, and uncertainty quantification) for multi-omics and related data types.
  • Build and optimize predictive and generative models (e.g., deep learning, probabilistic models, foundation-model adaptation, graph/neural sequence models) to support synthesis research questions.
  • Implement scalable training and inference pipelines using modern ML tooling (e.g., PyTorch/TensorFlow/JAX), version control, containers, and HPC/GPU resources.
  • Support the publication of intermediate data products, models, code, and documentation.
  • Stay up-to-date with the latest advancements in machine learning, AI for biology, and the rapidly evolving landscape of public molecular and cellular datasets.

Benefits

  • Comprehensive medical, dental, and vision coverage
  • Robust retirement plans
  • Substantial paid time off which includes holidays, vacation and sick time
  • 75% tuition discount, available to employees as well as eligible spouses and children
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service