Head of Data Platform

A2Z Sync•Denver, CO

About The Position

We operate a multi-tenant automotive SaaS platform serving thousands of dealer groups across the United States. Our data layer — MySQL, Aurora, DynamoDB with DynamoDB Streams, S3, Glue Data Catalog — has grown to support a complex, high-throughput transactional platform. That layer works. Now we need to make it intelligent. We are building a Dealer Intelligence Platform: a closed-loop system that observes raw signals from dealer operations, predicts outcomes, optimizes decisions under constraints, acts through approved channels, and learns from what happened. Pricing optimization, lead routing, inventory mix planning, service bay scheduling — each is a self-contained optimization function that consumes features, scores predictions, and closes the loop with action telemetry. This role owns the entire data substrate that makes that loop possible — the lake, the feature store, the model registry, the action ledger, and the governance framework that keeps it all tenant-isolated and audit-grade. You are not inheriting a finished architecture; you are designing the one that turns a transactional platform into a decision engine.

Requirements

7+ years in data engineering or data architecture with at least 2 years in a platform-level architect or Head-of role.
Deep experience with both relational (MySQL/PostgreSQL/Aurora) and NoSQL (DynamoDB, DynamoDB Streams) data modeling — and strong opinions on when each is appropriate.
Hands-on experience building or operating feature stores — online serving, offline training, point-in-time correctness, feature freshness SLOs.
Hands-on experience with AWS data and ML services: Glue, Athena, S3, DynamoDB, DynamoDB Streams, Aurora, Lake Formation, SageMaker.
Current, practicing AI/Gen-AI practitioner — you have built or operated systems that prepare data for LLMs, embeddings, or ML models in production. This is not theoretical interest; it is hands-on recent experience.
Experience with streaming and CDC patterns: DynamoDB Streams, Kinesis, EventBridge, or Kafka for real-time data propagation and feature materialization.
Experience with data pipeline orchestration: NiFi, Airflow, Step Functions, or equivalent.
Understanding of data migration patterns: dual-write, change data capture (CDC), reconciliation validation, zero-downtime cutover.
Experience with multi-tenant data architectures — database-per-tenant, schema-per-tenant, Row-Level Security, and the judgment to know which trade-offs matter at 5,000+ tenants.
Strong data governance instincts: retention policies, PII handling, audit trails, cost attribution, and the discipline to enforce them before the first model ships.
The ability to write a data architecture doc that engineers can implement without ambiguity, and the judgment to know when to write the SQL or pipeline code yourself instead.

Nice To Haves

Hands-on experience with SageMaker — training pipelines, model registry, Model Monitor, and production model lifecycle (shadow → canary → drift watch → auto-rollback).
Hands-on experience with AWS Bedrock — Knowledge Bases, model invocation, guardrails, and production deployment of foundation models.
Experience building embeddings pipelines — vectorizing structured and unstructured data for semantic search, recommendations, or retrieval-augmented generation.
Experience designing retrieval systems — chunking strategies, metadata filtering, re-ranking, and evaluating retrieval quality.
Experience building closed-loop data systems — action telemetry, outcome attribution, A/B holdout management, and lift measurement.
Experience with data quality and anomaly detection at scale — automated monitoring for schema drift, null rates, freshness SLAs, and feature/training skew.
Understanding of multi-tenant AI governance: PII redaction, tenant-scoped inference, per-dealer model routing, cost attribution, and audit logging.
Experience with automotive, fintech, or multi-tenant marketplace data — compliance retention requirements.
Familiarity with data formats and protocols from third-party providers (CDK, DealerTrack, Tekion).
Experience with constrained optimization, assignment problems, or scheduling solvers at the data layer.
Background with event-driven and streaming data architectures (EventBridge, DynamoDB Streams, Kafka, CDC streams).
Experience with vector databases (OpenSearch, Pinecone) for production AI workloads.
Experience with Iceberg, Delta Lake, or other open table formats for lakehouse architectures.

Responsibilities

Own the data platform end to end: Bronze (raw + telemetry), Silver (canonical entities via Standardization Agent), Gold (KPIs + derived features) — plus the Feature Store and Action Ledger that make the optimization loop possible.
Design and implement Feature Store architecture — online (sub-50ms reads for real-time scoring) and offline (point-in-time joins for training). Define Feature contract: owner, freshness SLO, PII tag, training/inference parity. Governed by Lake Formation.
Develop the Action Ledger — every recommendation, approval, override, and outcome logged as a first-class object. This is the substrate that closes the loop: without it, models cannot retrain, we cannot attribute lift, and we cannot prove value to dealers.
Build the Model data pipeline — the feature materialization, training data assembly, and serving infrastructure that feeds SageMaker models and Bedrock agents across all optimization functions.
Define and execute the data migration strategy for the legacy platform — defining which tables move to DynamoDB, which consolidate into Aurora Serverless, and how dual-write validation works at every stage.
Implement data quality and anomaly detection — automated monitoring for schema drift, null-rate spikes, stale pipelines, and integration data inconsistencies.
Establish data governance: retention policies, PII handling, audit trail integrity, and multi-tenant AI governance (per-dealer data isolation for model training, cost attribution for Bedrock inference, FTC-grade action audit).
Develop streaming and event-driven data flows: DynamoDB Streams, EventBridge, CDC patterns, and real-time feature materialization.