Data Scientist - Deep Learning (Hybrid)

Caris Life Sciences•Irving, TX

21h•Hybrid

About The Position

At Caris, we understand that cancer is an ugly word—a word no one wants to hear, but one that connects us all. That’s why we’re not just transforming cancer care—we’re changing lives. We introduced precision medicine to the world and built an industry around the idea that every patient deserves answers as unique as their DNA. Backed by cutting-edge molecular science and AI, we ask ourselves every day: “What would I do if this patient were my mom?” That question drives everything we do. But our mission doesn’t stop with cancer. We're pushing the frontiers of medicine and leading a revolution in healthcare—driven by innovation, compassion, and purpose. Join us in our mission to improve the human condition across multiple diseases. If you're passionate about meaningful work and want to be part of something bigger than yourself, Caris is where your impact begins. Position Summary Caris Life Sciences is seeking a creative, driven, and technically strong Data Scientist – Deep learning to join our Computational Pathology team. This role focuses on developing large-scale, generalizable machine learning models that learn rich representations from complex, high-dimensional data to support translational research and biomarker discovery. The successful candidate will play a central role in shaping Caris’ next-generation AI capabilities by designing scalable training pipelines, advancing representation learning approaches, and collaborating closely with scientific and clinical experts. This position is ideal for individuals with a strong background in deep learning, transformer-based architectures, and computational pathology, who are excited about building foundation-level modeling frameworks rather than task-specific solutions.

Requirements

PhD in Computer Science, Data Science, Computational Biology, Bioinformatics, Engineering, Mathematics, or a related quantitative field with exposure to biological or medical data.
0–4 years of experience applying machine learning or deep learning in research or industry settings (postdoctoral experience acceptable).
Strong understanding of deep learning model training, optimization, and evaluation.
Hands-on experience with transformer-based models, including both language-focused and vision-focused architectures.
Proficiency in Python and PyTorch.
Hands-on experience with distributed training (e.g., PyTorch DDP, multi-GPU or multi-node workflows).
Experience working in Linux environments and using Git for version control.
Ability to work with large datasets and complex data pipelines.
Strong written and verbal communication skills.

Nice To Haves

Background in computational pathology or experience working with large-scale imaging data.
Experience training large representation models or foundation models.
Familiarity with self-supervised and representation learning techniques, such as contrastive learning, DINO-style approaches, or related methods.
Experience working with multiple data sources in unified modeling frameworks.
Experience with cloud-based machine learning environments, including distributed training workflows (e.g., AWS, SageMaker).
Strong engineering mindset with attention to reproducibility, scalability, and model robustness.
Background in biomedical, translational, or applied research environments.

Responsibilities

Design, train, and evaluate foundation-style machine learning models that learn robust and reusable representations from large-scale datasets.
Develop and maintain scalable model training infrastructure using PyTorch and distributed training paradigms (e.g., multi-GPU and multi-node setups).
Train and adapt transformer-based architectures for representation learning across diverse data sources.
Apply self-supervised, weakly supervised, and representation learning techniques to leverage partially labeled or unlabeled data.
Build flexible modeling frameworks capable of integrating multiple data sources and heterogeneous signals.
Collaborate with pathologists, scientists, and engineers to ensure models are biologically meaningful and aligned with translational research goals.
Process, curate, and analyze large, complex datasets using efficient and reproducible workflows.
Support exploratory analyses, downstream modeling, and internal research initiatives using learned representations.
Contribute to internal technical documentation, research outputs, and long-term modeling strategy.
Follow best practices in software engineering, experiment tracking, and collaborative model development.