Staff Data Engineer

CVS Health

2d•$106,605 - $284,280

About The Position

We’re building a world of health around every individual — shaping a more connected, convenient and compassionate health experience. At CVS Health®, you’ll be surrounded by passionate colleagues who care deeply, innovate with purpose, hold ourselves accountable and prioritize safety and quality in everything we do. Join us and be part of something bigger – helping to simplify health care one person, one family and one community at a time. Position Summary If you’re eager to make a real impact in the healthcare industry through your own meaningful contributions, join us as we pave the way for technical innovation. At CVS Health, we possess an extensive repository of healthcare data spanning over 150 million individuals, providing an unparalleled foundation for ambitious engineers. In this high-impact, high-autonomy role, you will be a technical innovator and visionary, leading the evolution of our data infrastructure. You will take a lead role in the end-to-end development of critical data self-service platforms designed to modernize how petabyte-scale data is ingested, accessed, and managed. Your work will be instrumental in shifting from traditional, ticket-driven data handling toward a Data Mesh approach, empowering data owners to take full accountability for their data quality through the robust internal tools you build. As a Staff Data Engineer, you will: Architect Petabyte Pipelines: Engineer scalable, reliable, and performant data pipelines to assemble large and intricate datasets using SQL, DBT, and Snowflake, ensuring high data availability and integrity. Build Data Platforms: Independently design and maintain internal React (TypeScript) interfaces and Python backend services that automate data ingestion and discovery, reducing lead times for application teams from weeks to minutes. Develop Data APIs: Build and maintain production-grade REST and gRPC APIs that serve as the high-performance interface between our Snowflake data layer and downstream consumer touchpoints. Modernize Data Operations: Implement a GitOps model for data using GitHub Actions and Argo/Kargo, integrating standardized logging, alerting, and automated observability into the heart of all data products. Innovate with AI: Leverage Cursor AI, MCPs, and other AI tooling to accelerate the data engineering SDLC, from optimizing complex SQL queries to automating schema migrations. Collaborate and Lead: Communicate with business leaders to translate complex data requirements into functional specifications while mentoring other engineers in modern data architecture and software best practices. Key Responsibilities Data Architecture: Design and optimize high-volume ETL/ELT pipelines using SQL, DBT, and Snowflake, ensuring data is modeled for both analytical and operational use cases. Internal Tooling (Full Stack): Develop and maintain internal-facing web applications using React that allow data owners to interact with, monitor, and configure their data pipelines. API Development: Architect and implement REST and gRPC APIs in Python that serve as the interface between our Snowflake data layer and downstream consumer applications. CI/CD & GitOps: Own the deployment lifecycle of data services and tools using GitHub Actions for CI and Argo/Kargo for continuous delivery and lifecycle management. Self-Service Platforms: Build "Data-as-a-Service" features, such as automated UI-driven ingestion workflows, reducing the reliance on manual data engineering tickets. AI Integration: Utilize modern AI development tools (e.g., Claude AI) to accelerate the development of both data pipelines and management interfaces.

Requirements

7+ years of experience in Data Engineering with a heavy focus on Python as the primary scripting and backend language.
7+ years of experience with SQL and cloud data warehouses (e.g Snowflake, AWS, GCP, etc.)
7+ years of experience building high-volume ETL/ELT pipelines and data modeling.
Bachelor’s Degree in Computer Science, Data Engineering, or a related technical field.

Nice To Haves

5+ years of experience with DBT (Data Build Tools).
5+ years of experience building frontend applications with React and designing RESTful APIs.
5+ years of experience with GitHub Actions and GitOps-based deployment tools (e.g., Argo or Kargo).
High-level understanding of big data design patterns, including Data Lake, Data Mesh, and Iceberg, along with data normalization strategies.
Demonstrated experience with Argo/Kargo for Kubernetes-based deployments and advanced GitHub Actions for workflow automation.
Experience with message queuing technologies such as Kafka, SNS, or RabbitMQ to support real-time data movement.
Proficiency in working with Cursor AI, GitHub CoPilot, or similar AI-driven environments to accelerate engineering cycles.
Strong experience with metrics, logging, monitoring, and alerting tools to ensure production system reliability.
Strong grasp of data structures, algorithms, async programming patterns, and parallel programming.
High-level understanding of HL7 V2.x or FHIR based interface messages.

Responsibilities

Engineer scalable, reliable, and performant data pipelines to assemble large and intricate datasets using SQL, DBT, and Snowflake, ensuring high data availability and integrity.
Independently design and maintain internal React (TypeScript) interfaces and Python backend services that automate data ingestion and discovery, reducing lead times for application teams from weeks to minutes.
Build and maintain production-grade REST and gRPC APIs that serve as the high-performance interface between our Snowflake data layer and downstream consumer touchpoints.
Implement a GitOps model for data using GitHub Actions and Argo/Kargo, integrating standardized logging, alerting, and automated observability into the heart of all data products.
Leverage Cursor AI, MCPs, and other AI tooling to accelerate the data engineering SDLC, from optimizing complex SQL queries to automating schema migrations.
Communicate with business leaders to translate complex data requirements into functional specifications while mentoring other engineers in modern data architecture and software best practices.
Design and optimize high-volume ETL/ELT pipelines using SQL, DBT, and Snowflake, ensuring data is modeled for both analytical and operational use cases.
Develop and maintain internal-facing web applications using React that allow data owners to interact with, monitor, and configure their data pipelines.
Architect and implement REST and gRPC APIs in Python that serve as the interface between our Snowflake data layer and downstream consumer applications.
Own the deployment lifecycle of data services and tools using GitHub Actions for CI and Argo/Kargo for continuous delivery and lifecycle management.
Build "Data-as-a-Service" features, such as automated UI-driven ingestion workflows, reducing the reliance on manual data engineering tickets.
Utilize modern AI development tools (e.g., Claude AI) to accelerate the development of both data pipelines and management interfaces.