Senior Data Engineer

Caris Life Sciences•Tempe, AZ

About The Position

At Caris, we understand that cancer is an ugly word—a word no one wants to hear, but one that connects us all. That’s why we’re not just transforming cancer care—we’re changing lives. We introduced precision medicine to the world and built an industry around the idea that every patient deserves answers as unique as their DNA. Backed by cutting-edge molecular science and AI, we ask ourselves every day: “What would I do if this patient were my mom?” That question drives everything we do. But our mission doesn’t stop with cancer. We're pushing the frontiers of medicine and leading a revolution in healthcare—driven by innovation, compassion, and purpose. Join us in our mission to improve the human condition across multiple diseases. If you're passionate about meaningful work and want to be part of something bigger than yourself, Caris is where your impact begins. Position Summary The Senior Data Engineer will support our precision medicine and biomarker discovery initiatives. This role is responsible for designing, building, and maintaining scalable, cloud-native data platforms and pipelines that support analytics, machine learning, and computational biology workflows across structured and unstructured, multi-modal datasets, and brings strong software engineering and data architecture expertise, deep experience with AWS cloud services, and a collaborative mindset to partner closely with data scientists, computational biologists, and R&D stakeholders.

Requirements

Ph.D.’s degree in Computer Science, Engineering, or a related technical field (or equivalent practical experience).
5+ years of professional experience in data engineering, platform engineering, or backend software engineering roles.
Strong proficiency in Python and experience building production-grade data pipelines and services.
Extensive experience designing and operating data platforms on AWS, including services such as EC2, S3, DynamoDB, EKS/ECS, Lambda, Glue, Athena, and related services.
Experience with Infrastructure as Code (IaC) using tools such as Terraform, CloudFormation, or CDK.
Expertise in designing, implementing, and maintaining relational and non-relational databases (e.g., MySQL, PostgreSQL, MongoDB).
Extensive experience with containerization and orchestration technologies.
Strong proficiency with Linux and command-line–based workflows.
Familiarity with modern data platform concepts, including data lakes, lakehouses, streaming, and batch processing architectures.
Experience applying best practices in DevOps, DataOps, and/or MLOps, including CI/CD, monitoring, and automated testing.
Strong communication skills and the ability to collaborate effectively with multidisciplinary scientific and engineering teams.
Team-oriented mindset with a passion for building robust platforms that enable data-driven discovery and personalized medicine.

Nice To Haves

Familiarity with cancer biology concepts, including tumor genomics and molecular profiling workflows.
Experience supporting data pipelines for molecular diagnostics, biomarker discovery, or translational research.
Working knowledge of common molecular and clinical data types used in oncology research (e.g., NGS-derived data, variant annotations, expression matrices, clinical metadata).
Experience handling high-throughput sequencing–derived data and associated metadata at scale, including ingestion, normalization, and provenance tracking.
Understanding of bioinformatics data standards and formats (e.g., FASTQ, BAM/CRAM, VCF, GTF, or similar structured scientific data representations).
Familiarity with public cancer and genomics datasets (e.g., TCGA, COSMIC, cBioPortal, GEO, or equivalent resources).
Experience collaborating closely with computational biologists, bioinformaticians, and cancer researchers to translate research requirements into scalable data platform solutions.
Awareness of data quality, reproducibility, and traceability requirements in regulated or clinically adjacent oncology environments.

Responsibilities

Design, build, and maintain scalable, reliable, and secure data pipelines for ingesting, transforming, storing, and serving large, multi-source and multi-omics datasets.
Architect and implement cloud-native data solutions on AWS to support analytics workflows, machine learning pipelines, and scientific research.
Develop and maintain automation frameworks for data ingestion, processing, validation, and delivery.
Build and deploy APIs, services, and data access layers to enable analytics and machine-learning solutions at scale.
Develop and deploy applications and workflows in cloud and/or HPC environments, adhering to industry best practices for system architecture, CI/CD, testing, and software design.
Partner closely with data scientists, computational biologists, and R&D scientists to design and evolve shared analytics platforms.
Optimize data systems for performance, cost efficiency, scalability, and reliability.
Ensure data quality, observability, and lineage across pipelines and platforms.
Adhere to coding, documentation, security, and compliance standards; manage technical deliverables for assigned projects.
Provide general informatics and platform support for laboratory research, technology development, and clinical studies.
Contribute to architectural decisions and mentor junior engineers as appropriate.