Senior Data Architect (Hands on)

Stratus

3d•Remote

About The Position

The Senior Data Architect is responsible for owning the canonical data architecture, including schema, contracts, tenancy, and governance, which serves as the foundation for all product and AI/ML workloads. This role involves designing, prototyping, and implementing reference implementations and in-repo guardrails, rather than solely creating diagrams. The focus is on building durable, domain-specific data assets for AI, emphasizing how data is modeled, governed, and made trustworthy. The architect will ensure the data layer is AI/ML-ready, production data is AI-ready, and design necessary integration patterns. They will also own the canonical data model, establish data architecture standards, manage polyglot persistence, and define the multi-tenant data architecture. Additionally, the role involves leading modernization efforts for data pipelines and lake/lakehouse layers, migrating from homegrown solutions to industry-standard platforms, and modernizing legacy data-access patterns. Technical leadership includes driving prototypes, defining patterns, establishing data quality standards, mentoring engineers, and making timely, defensible decisions. Cross-team partnership with database engineering, ML and application engineering, and platform/infrastructure teams is crucial.

Requirements

8+ years in data architecture, data engineering, database administration, or analytics engineering, with 3+ years in senior / lead roles.
Demonstrated ownership of a canonical or enterprise data model / cross-product schema.
Hands-on MongoDB at production scale (Atlas M40+ ideal): document modeling, aggregation framework, indexing, change streams, sharding, replica sets.
Strong polyglot-persistence judgment: deciding what belongs in documents vs. relational vs. a vector store, and migrating between them incrementally.
Hands-on relational depth: schema design, indexing strategy, and query tuning, plus familiarity with vector search.
Production experience making data AI/ML-ready: data architecture supporting RAG, semantic search, embeddings / vector pipelines, or agentic workloads.
Multi-tenant architecture experience: data residency and per-tenant cost attribution.
Pipeline / ELT / lake / lakehouse design at scale, with incremental migration strategies.
Cloud-native data services (Azure, AWS, or GCP).
Strong grasp of data quality, testing, lineage, and monitoring — including observability for pipelines and AI/ML serving.
Comfortable modeling a complex, specialized domain.
Appetite to learn the domain is required.

Nice To Haves

Knowledge-graph, ontology, or semantic-layer experience.
CDC and cross-engine sync (MongoDB Change Streams, Debezium, or equivalent).
Lakehouse platforms (Databricks, Snowflake, or open table formats — Iceberg, Delta, Hudi) and feature stores (Feast or equivalent).
Data governance for AI/agent access to production data: query-cost controls, read-path safety, lineage, and audit for higher-risk use cases.
SOC 2 and data-classification experience.
Azure data ecosystem (Data Factory, Synapse, Functions, Event Grid).
MongoDB certification (Associate DBA / Developer or higher) or substantive MongoDB University coursework.
MEP / AEC / construction experience is a plus.

Responsibilities

Architect the data layer so AI/ML workloads run on a clean, governed substrate.
Make production data AI-ready: well-modeled, contract-enforced, lineage-tracked, and drift-detectable.
Design the data-side integration patterns these workloads depend on, such as feature-store and vector-store patterns.
Own the canonical data model — the normalized definition of the core business objects shared across our products — and decide what is canonical versus tenant-specific.
Establish data architecture standards, data contracts, and schema discipline the rest of engineering builds against, enforced in-repo.
Exercise strong polyglot-persistence judgment: what belongs in document vs. relational vs. vector stores, and how to migrate between them.
Define the multi-tenant data architecture: tenancy isolation, data residency posture, and per-tenant cost attribution.
Lead staged modernization toward the right mix of stores and patterns for transactional, analytical, and AI/ML use cases.
Own the architectural direction of the data pipeline and lake / lakehouse layer: ingestion, transformation, orchestration, and storage tiers.
Lead the move from homegrown pipelines to proven, industry-standard platforms.
Modernize legacy data-access patterns via incremental, strangler-fig migrations.
Drive hands-on prototypes, reference implementations, and in-repo guardrails.
Define the data, storage, and retrieval patterns the rest of engineering builds against.
Establish data quality, testing, lineage, and observability standards for pipelines and AI/ML serving.
Mentor engineers on schema discipline, modern data practices, and AI/ML-readiness patterns.
Make canonical decisions that are time-boxed, written, and defensible.
Use AI-assisted development tools as a force multiplier.
Partner with database engineering on production data health.
Partner with ML and application engineering on their data needs.
Partner with platform / infrastructure on reliability, disaster recovery, residency, and the multi-tenant operational posture.