Data Engineer II

SCP Health•Atlanta, GA

18h•$74,840 - $123,816

About The Position

At SCP Health, what you do matters. As part of the SCP Health team, you have an opportunity to make a difference. At our core, we work to bring hospitals and healers together in the pursuit of clinical effectiveness. With a portfolio of over 8 million patients, 7500 providers, 30 states, and 400 healthcare facilities, SCP Health is a leader in clinical practice management spanning the entire continuum of care, including emergency medicine, hospital medicine, wellness, telemedicine, intensive care, and ambulatory care. The company offers a strong track record of providing excellent work/life balance, a comprehensive benefits package and competitive compensation, and a commitment to fostering an inclusive culture of belonging and empowerment through its core values - collaboration, courage, agility, and respect. The company is also recognized as the #1 Top Workplace in the area.

Requirements

Core SQL & Programming: Strong proficiency in writing and optimizing complex SQL queries. Competency in Python for data scripting, API interactions, and basic automation tasks.
Snowflake Proficiency: Solid working knowledge of Snowflake fundamentals, including virtual warehouses, stages, and the use of Tasks and Streams for change data capture (CDC).
Data Transformation & Medallion Logic: Practical experience using dbt (data build tool) to move data through Bronze, Silver, and Gold layers. Ability to apply business logic to transform raw clinical data into structured, joinable tables.
Healthcare Data Literacy: Familiarity with healthcare-specific data formats (HL7, FHIR, or flat-file EMR extracts). Understanding of how clinical data (diagnoses, procedures, provider IDs) supports clinical, financial, and operational workflows.
Data Quality, Observability & Operations: Ability to implement automated tests and monitoring (e.g., null/threshold checks, freshness checks, alerts) and troubleshoot pipeline issues using root-cause analysis and runbooks to restore service safely.
Problem Solving: A disciplined approach to troubleshooting data discrepancies between source systems and the Data Platform.
Data Modeling & Warehousing Concepts: Knowledge of dimensional modeling (star/snowflake schemas), slowly changing dimensions (SCD), and the tradeoffs between normalized and denormalized designs for analytics and reporting workloads.
ETL/ELT & Data Integration Patterns: Ability to design reliable batch and near-real-time loads using incremental strategies (e.g., watermarking, CDC patterns), idempotent processing, and backfill/reprocessing techniques; working knowledge of RESTful APIs, authentication (API keys/OAuth), and common data formats (JSON, CSV, Parquet).
Engineering Practices (CI/CD, Version Control & Code Quality): Familiarity with automated build/test/deploy pipelines for analytics engineering (e.g., dbt jobs), environment promotion (dev/test/prod), and rollback approaches; ability to follow code review standards and create reusable components (macros, shared modules) using consistent conventions.
Documentation, Metadata & Communication: Skill in producing clear technical documentation (data dictionaries, lineage notes, operating procedures), maintaining key metadata (definitions, ownership, refresh cadence), and explaining data concepts and tradeoffs to technical and non-technical partners.
Stakeholder Partnership: Ability to gather requirements, ask clarifying questions, and translate business needs (e.g., revenue cycle, coding, scheduling) into scalable data solutions with well-defined acceptance criteria.
Prioritization & Ownership: Ability to manage multiple initiatives, communicate progress and risks early, and take end-to-end ownership from design through production support in an Agile environment.
Security, Privacy & Access Controls: Demonstrated discretion handling sensitive data; working knowledge of least-privilege RBAC and auditing concepts; ability to follow HIPAA-aligned handling practices and escalate potential compliance concerns appropriately.
Cost Awareness (Snowflake/FinOps): Ability to interpret warehouse usage and query profiles, apply practical cost controls (resource monitors, scheduling, right-sizing), and balance performance with consumption.

Responsibilities

Pipeline Development: Design, build, and maintain scalable data pipelines to ingest internal application data (Scheduling, HR, Finance, etc) and external clinical data (EHR extracts, HL7, FHIR) into the Data Platform.
Medallion Lifecycle Management: Design and implement transformation logic to move data from Bronze to Gold, ensuring adherence to data quality standards, documentation expectations, and business rules.
Domain Stewardship: Partner with business and technical stakeholders to map source system values to standardized enterprise models, supporting core workflows and consistent definitions across teams.
Performance, Reliability & Cost Optimization: Monitor and optimize Snowflake usage, query performance, and pipeline latency; apply practical cost controls (e.g., right-sizing warehouses, resource monitors) and ensure dependable batch and near-real-time data availability.
Data Governance, Security & Access Controls: Implement HIPAA-compliant data handling practices, including role-based access control, row-level security, data masking, and audit logging; support access request validation and periodic access reviews for sensitive datasets.
Integration, Source Onboarding & Reusable Patterns: Partner with the Facility Integration team and App Dev teams to onboard new sources; profile data, document source-to-target mappings, and build reusable ingestion/validation patterns that support reliable handoffs and downstream consumption.
On-Call Support & Incident Response: Participate in an on-call rotation to respond to pipeline failures and data availability issues; triage incidents, communicate status to stakeholders, and drive issues to resolution with appropriate post-incident follow-up.
Data Quality, Testing & Service Levels: Develop and maintain automated tests and reconciliation checks (e.g., row counts, referential integrity, threshold checks); define and monitor data freshness, completeness, and availability targets for key datasets.
Documentation, Metadata & Standards: Create and maintain pipeline documentation, data dictionaries, runbooks, and key metadata (definitions, owners, refresh cadence) to improve discoverability, auditability, and consistent engineering practices.
Release & Change Management: Coordinate safe deployment of data pipeline and model changes across environments (dev/test/prod), ensuring version control, peer reviews, and rollback plans are followed.
Requirements & Delivery Partnership: Work with business and analytics partners to clarify requirements, define acceptance criteria, and deliver curated datasets that support reporting, dashboards, and downstream operational workflows.