Data Engineer

Salma Health•Sacramento, CA

1d•$119,000 - $185,000•Hybrid

About The Position

We are looking to hire a Data Engineer to join our team as we build the data backbone for a mental and behavioral health practice. This role will build the platform that turns appointments, assessments, billing, and patient engagement data into the metrics our clinical and operations teams rely on. As a mid-level data engineer, you'll own meaningful pieces of our pipeline end-to-end: from pulling data out of third-party APIs, through medallion architecture transformations in dbt, to exposing curated metrics through our semantic layer. This is a hands-on role on a small team. You'll write code that runs in production every day, ship improvements weekly, and have direct visibility into how the data is used. We work in a HIPAA-regulated environment, so thoughtfulness about data handling is part of the job.

Requirements

4-7 years of professional experience building and operating data pipelines in production
From conversation to shipped data product: you're comfortable owning a request end-to-end: scoping it with a non-technical stakeholder, writing requirements clear enough that you (and others) can build against them, implementing the models or metrics, and verifying with the stakeholder that what shipped solves their problem.
Strong Python: comfortable writing modules, structuring code for reuse and testability, and debugging issues across an async or orchestrated pipeline.
Solid SQL skills, including window functions, CTEs (including recursive ones), and the ability to reason about query performance.
Hands-on experience with dbt: building models, writing tests, and understanding materializations.
Working knowledge of an orchestration framework: (Dagster, Airflow, Prefect, or similar), including the mental model of assets/tasks, dependencies, and scheduling.
Comfort with AWS fundamentals: S3, IAM, Secrets Manager, and either ECS or Lambda for compute.
Git-based workflows: code review, and writing PRs that are reviewable.

Nice To Haves

Experience with Dagster specifically.
Experience with semantic layer tools (Cube.js, dbt Semantic Layer/MetricFlow, LookML)
Healthcare data experience (HIPAA, EHR systems, ICD-10/CPT codes)
CloudFormation, Terraform, or another IaC tool
Experience with GraphQL APIs as a consumer (pagination, introspection, dealing with rate limits and retries)
Familiarity with identity resolution patterns or slowly-changing dimension modeling

Responsibilities

Maintaining and improving the orchestration layer: Dagster assets, jobs, schedules, sensors, and the dependency graph that ties extraction → loading → transformation together.
Adding new data sources to the pipeline; extracting from APIs (GraphQL, REST), Google Drive folders, and CSV/JSONL drops on S3, then landing them in our bronze schemas via Dagster assets.
Building silver and gold dbt models that transform raw source data into our unified entity model following the medallion architecture.
Extending our semantic layer so business metrics are available to downstream consumers (BI tool dashboards, AI agents, ad-hoc analysis) without re-deriving logic
Operating the platform on AWS: ECS Fargate services, RDS, S3, Secrets Manager, CloudFormation templates, and the CodePipeline-based CI/CD that deploys our data platform. All of our data platforms are deployed with IaC tools.
Writing tests (pytest for Python, dbt tests for models, data quality tests) and contributing to internal documentation as new patterns emerge.