Prima Mente-posted 1 day ago
Full-time • Mid Level
San Francisco, CA

Prima Mente’s goal is to deeply understand the brain, to protect the brain from neurological disease and enhance the brain in health. We do this by generating our own data, building brain foundation models, and translating discovery to real clinical and research impact. Role focus - Foundation Models for Biology Architect, build, and scale our foundational AI infrastructure. You'll ensure our ML models are developed and deployed on highly performant, scalable, and reliable systems. Your expertise will enable rapid experimentation and seamless deployment of large-scale multi-omic models, empowering researchers to advance groundbreaking scientific discoveries.

  • Architect, develop, and optimize robust ML training and inference infrastructure capable of supporting large-scale genomic foundation models.
  • Design and implement scalable and efficient distributed computing platforms leveraging cloud (AWS/GCP/Azure) and HPC clusters.
  • Develop highly automated, reproducible data pipelines and CI/CD workflows that accelerate model development, testing, and deployment.
  • Performance-tune infrastructure and models, optimizing resource utilization (GPU/TPU) and significantly improving computation efficiency.
  • Collaborate cross-functionally with ML researchers, bioinformaticians, and scientists to translate research needs into scalable engineering solutions.
  • Ensure system reliability, robustness, and high availability, proactively implementing comprehensive monitoring, logging, and alerting solutions.
  • Champion infrastructure-as-code (IaC) practices, promoting clarity, reproducibility, security, and auditability.
  • Demonstrated ability to solve complex problems independently, with exceptional troubleshooting and system debugging skills.
  • Excellent communication skills and experience collaborating within multidisciplinary teams.
  • Experience designing and deploying scalable, distributed ML infrastructure in cloud and/or hybrid HPC environments.
  • Proficiency in Kubernetes, Docker, Terraform (or equivalent infrastructure automation tools), and cloud services (AWS, GCP, Azure).
  • Deep experience with ML workflow orchestration tools (e.g., Kubeflow, Ray, Airflow, Metaflow).
  • Excellent programming skills in Python; experience with Bash, Go, or C++ is beneficial.
  • Strong understanding of ML frameworks (PyTorch, TensorFlow, JAX) and familiarity with distributed training methods, GPU acceleration, and optimization libraries (e.g., XLA, NCCL).
  • Excellent understanding of software development best practices, CI/CD, and automation.
  • Familiarity with GPU/TPU acceleration and performance optimization (XLA/NCCL).
  • Experience with bioinformatics or biological data handling.
  • Knowledge of data governance, compliance, and security standards relevant to healthcare or biotech.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service