Data Engineer

CVS Health•New York, NY

4d•$118,102 - $173,040•Remote

About The Position

We’re building a world of health around every individual — shaping a more connected, convenient and compassionate health experience. At CVS Health®, you’ll be surrounded by passionate colleagues who care deeply, innovate with purpose, hold ourselves accountable and prioritize safety and quality in everything we do. Join us and be part of something bigger – helping to simplify health care one person, one family and one community at a time. Position Summary: Caremark LLC, a CVS Health company, is hiring for the following role in New York, NY: Data Engineer to Develop, build and manage large-scale data structures, pipelines and efficient Extract/Load/Transform (ETL) workflows to address complex problems and support business applications.

Requirements

Master’s degree (or foreign equivalent) in Computer Science, Data Science, Statistics, Mathematics, Analytics, or a related field
Completion of a university-level course, research project, internship, thesis, or 6 months of experience in CI/CD, Jenkins, GIT, or DevOps
Completion of a university-level course, research project, internship, thesis, or 6 months of experience in SAS or SQL
Completion of a university-level course, research project, internship, thesis, or 6 months of experience in Java, Python, or R
Completion of a university-level course, research project, internship, thesis, or 6 months of experience in Azure, Amazon Web Services (AWS), or Google Cloud Platform (GCP)
Completion of a university-level course, research project, internship, thesis, or 6 months of experience in Hadoop and HDFS
Completion of a university-level course, research project, internship, thesis, or 6 months of experience in Spark, PySpark, or Scala
Completion of a university-level course, research project, internship, thesis, or 6 months of experience in Machine learning, statistical analysis, and predictive modeling
Completion of a university-level course, research project, internship, thesis, or 6 months of experience in Infrastructure components: GPU’s or CPU’s
Completion of a university-level course, research project, internship, thesis, or 6 months of experience in NLP (Scikit-Learn, SpaCity, Pytorch, or Spark NLP)
Completion of a university-level course, research project, internship, thesis, or 6 months of experience in Extract/Transform/Load (ETL) processes
Completion of a university-level course, research project, internship, thesis, or 6 months of experience in Cloud components including cluster management
Completion of a university-level course, research project, internship, thesis, or 6 months of experience in Data warehousing
Completion of a university-level course, research project, internship, thesis, or 6 months of experience in Big Data implementation
Completion of a university-level course, research project, internship, thesis, or 6 months of experience in Machine learning algorithms
Completion of a university-level course, research project, internship, thesis, or 6 months of experience in Designing data architectures, including data pipelines, distributed computing engines, and machine learning infrastructure design
Completion of a university-level course, research project, internship, thesis, or 6 months of experience in Contributing to large-scale applications development, data science, or data analytics projects.

Responsibilities

Develop large scale data structures and pipelines to organize, collect and standardize data to generate insights and addresses reporting needs.
Write ETL (Extract/Transform/Load) processes, design database systems, and develop tools for real-time and offline analytic processing that improve existing systems and expand capabilities.
Collaborate with Data Science team to transform data and integrate algorithms and models into automated processes.
Test and maintain systems and troubleshoot malfunctions.
Leverage knowledge of Hadoop architecture, HDFS commands, and designing and optimizing queries to build data pipelines.
Utilize programming skills in Python, Java, or similar languages to build robust data pipelines and dynamic systems.
Build data marts and data models to support Data Science and other internal customers.
Integrate data from a variety of sources and ensure adherence to data quality and accessibility standards.
Analyze current information technology environments to identify and assess critical capabilities and recommend solutions to complex business problems.
Experiment with available tools and advise on new tools to provide optimal solutions that meet the requirements dictated by the model/use case.