Principal Data Engineer

CVS Health•Hartford, CT

4d•$144,200 - $288,400•Hybrid

About The Position

We’re building a world of health around every individual — shaping a more connected, convenient and compassionate health experience. At CVS Health®, you’ll be surrounded by passionate colleagues who care deeply, innovate with purpose, hold ourselves accountable and prioritize safety and quality in everything we do. Join us and be part of something bigger – helping to simplify health care one person, one family and one community at a time. Position Summary: Caremark LLC, a CVS Health company, is hiring for the following role in Hartford, CT: Principal Data Engineer to develop large scale data structures and pipelines to organize, collect and standardize data that helps generate insights and addresses reporting needs. Collaborate with data science team to transform data and integrate algorithms and models into automated processes. Build data marts and data models to support Data Science and other internal customers. Integrate data from a variety of sources, assuring that they adhere to data quality and accessibility standards. Analyze current information technology environments to identify and assess critical capabilities and recommend solutions. Build high-performance data processing frameworks leveraging cloud and/or on-premise data platform. Design and build large-scale data structures, pipelines, and efficient Extract/Load/Transform (ETL) workflows. Write ETL (Extract / Transform / Load) processes, design database systems, and develop tools for real- time and offline analytic processing. Transform data and integrate algorithms and models into automated processes. Analyze and synthesize data to meet the insights, reporting dashboard, and descriptive/predictive/prescriptive analytic requirements. Design conformed, aggregated, and semantic data layers, and manipulating large datasets to support insights and analytics using SQL, BTEQ, SAS, and similar tools, as applicable. Data management in building data layers in Sandbox or a production environment for reporting and analytical use cases. Work on "big data" platforms, including Hadoop (Azure or GCP preferred) and Spark, as applicable. Design data models and solutions for analytical and reporting use cases. Use knowledge in Hadoop architecture, HDFS commands, and experience as applicable, designing and optimizing queries to build data pipelines. Use strong programming skills in Python, Java, and/or any of the major languages to build robust data pipelines and dynamic systems. Experiment with available software tools and advise on new tools in order to determine optimal solution given the requirements dictated by the model/use case. Support modeling/diagramming and build design specifications for data objects and surrounding data processing logic. Collaborate with business solution strategists and support new data source onboarding process through data discovery, profiling, and mapping. Participate in proof of concepts to build the data layers and concepts to derive analytical insights. Leverage multiple tools and programming languages to analyze and manipulate data sets from disparate data sources. Telecommuting available. Multiple positions.

Requirements

Master’s degree (or foreign equivalent) in Computer Science, Computer Information Systems, Data Science, Statistics, Mathematics, Analytics, or a related field
Two (2) years of experience in the job offered or related occupation.
Two (2) years of experience in Cloud migration technologies: Azure, Amazon Web Services (AWS), or Google Cloud Platform (GCP)
Two (2) years of experience in Messaging platform: Kafka
Two (2) years of experience in Containerization runtime platform
Two (2) years of experience in Solution Architecture, design, and end-to-end delivery of projects
Two (2) years of experience in Domain support for healthcare or retail organization
Two (2) years of experience in Build Proof of Value (PoV) and MVP using AI: Generative AI, AutoML, or Virtual AI Databases
Two (2) years of experience in Provide guidance on Large Language Model (LLM) selection and use of Minimum Viable Products (MVPs)
Two (2) years of experience in Conduct Data Quality assessments, define Data Governance processes through Data Quality and MLOps
Two (2) years of experience in Establish data architectures and best practices

Responsibilities

Develop large scale data structures and pipelines to organize, collect and standardize data that helps generate insights and addresses reporting needs.
Collaborate with data science team to transform data and integrate algorithms and models into automated processes.
Build data marts and data models to support Data Science and other internal customers.
Integrate data from a variety of sources, assuring that they adhere to data quality and accessibility standards.
Analyze current information technology environments to identify and assess critical capabilities and recommend solutions.
Build high-performance data processing frameworks leveraging cloud and/or on-premise data platform.
Design and build large-scale data structures, pipelines, and efficient Extract/Load/Transform (ETL) workflows.
Write ETL (Extract / Transform / Load) processes, design database systems, and develop tools for real- time and offline analytic processing.
Transform data and integrate algorithms and models into automated processes.
Analyze and synthesize data to meet the insights, reporting dashboard, and descriptive/predictive/prescriptive analytic requirements.
Design conformed, aggregated, and semantic data layers, and manipulating large datasets to support insights and analytics using SQL, BTEQ, SAS, and similar tools, as applicable.
Data management in building data layers in Sandbox or a production environment for reporting and analytical use cases.
Work on "big data" platforms, including Hadoop (Azure or GCP preferred) and Spark, as applicable.
Design data models and solutions for analytical and reporting use cases.
Use knowledge in Hadoop architecture, HDFS commands, and experience as applicable, designing and optimizing queries to build data pipelines.
Use strong programming skills in Python, Java, and/or any of the major languages to build robust data pipelines and dynamic systems.
Experiment with available software tools and advise on new tools in order to determine optimal solution given the requirements dictated by the model/use case.
Support modeling/diagramming and build design specifications for data objects and surrounding data processing logic.
Collaborate with business solution strategists and support new data source onboarding process through data discovery, profiling, and mapping.
Participate in proof of concepts to build the data layers and concepts to derive analytical insights.
Leverage multiple tools and programming languages to analyze and manipulate data sets from disparate data sources.