Data Engineer I

ValenzPhoenix, AZ
Remote

About The Position

As a Data Engineer I, you’ll play a hands-on role in building and supporting scalable data pipelines within our cloud-based Lakehouse environment (Azure Databricks, Delta Lake), leveraging tools like Spark and PySpark. You’ll help bring in healthcare data from a variety of sources, ensuring it’s accurate, reliable, and ready to support analytics and reporting needs across the organization. You’ll also partner closely with the broader Analytics team to make sure data is delivered in a way that’s clear and actionable. Over time, you’ll build expertise in managing large, complex datasets and contribute to evolving our data architecture to support new and emerging data sources as the business grows.

Requirements

  • 1+ years of work experience in a data engineering role.
  • Bachelor’s degree or greater in a quantitative field such as statistics, mathematics, engineering, computer science, finance, or economics or equivalent practical experience
  • Hands-on experience with Databricks (Spark, PySpark, Delta Lake) and/or migrating RDBMS systems to a data lakehouse.
  • Experience working the most common types of healthcare data (medical claims, eligibility, provider network rosters, Rx claims, etc) from a variety of sources.
  • Strong organizational skills and time management capacity to balance multiple projects with limited supervision.
  • Ability to build (and re-evaluate) a process from the ground up.
  • Strong investigative skills with ability to search beyond the initial results.
  • High attention to detail, with overwhelming desire to test and double-check your own results.
  • Comfortable working with messy data and ambiguous results
  • Hands-on experience with SQL and Python (including PySpark) for distributed data processing.
  • Experience building and optimizing large-scale distributed data pipelines for both batch and streaming ingestion.

Responsibilities

  • Create and maintain processes to acquire, validate, and enrich data from various sources.
  • Support the migration of on-premise data systems (SQL Server) to a cloud-based lakehouse architecture (Azure Databricks, Delta Lake), including data transformation and pipeline re-architecture.
  • Develop and optimize ETL/ELT pipelines using PySpark and Spark SQL.
  • Implement Lakehouse + Delta architecture best practices to ensure a standardized and scalable way that we store and process our data, including schema enforcement, ACID transactions, and data versioning.
  • Orchestrate data pipelines using Databricks Workflows (Jobs) or similar tools.
  • Implement data quality frameworks, validation checks, and monitoring for pipeline reliability.
  • Optimize performance and cost of data pipelines.
  • Collaborate on CI/CD practices for data pipelines, including testing, deployment, and versioning.
  • Partner with data analysts, data scientists, and business stakeholders to identify new sources of data and estimate feasibility of acquiring specific data sources.
  • Design and implement data models to support analytics, reporting, and data warehousing use cases.
  • Take an active role in agile processes.
  • Perform other duties as assigned.

Benefits

  • Generously subsidized company-sponsored Medical, Dental, and Vision insurance, with access to services through our own products, Healthcare Blue Book and KISx Card.
  • Spending account options: HSA, FSA, and DCFSA
  • 401K with company match and immediate vesting
  • Flexible working environment
  • Generous Paid Time Off to include vacation, sick leave, and paid holidays
  • Employee Assistance Program that includes professional counseling, referrals, and additional services
  • Paid maternity and paternity leave
  • Pet insurance
  • Employee discounts on phone plans, car rentals and computers
  • Community giveback opportunities, including paid time off for philanthropic endeavors
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service