Principal Software Engineer, Data Engineering

Highspot•Seattle, WA

About The Position

Highspot is looking for a Principal Data Engineer to join their Data Platform team. This role involves defining the technical vision for high-scale data products that power customer-facing analytics, intelligent AI agents, and core product capabilities. The position will own the architecture that serves these diverse needs from a unified foundation. The Principal Data Engineer will shape the overarching data architecture direction and influence the data strategy, acting as a bridge between upstream software engineering teams (data producers) and downstream engineering and AI teams (data consumers). The role requires leading technical execution, balancing architectural pipeline design with advanced data modeling, query optimization, and data trust. The data engineering challenges are unique due to the platform generating rich, deeply nested document-oriented data from millions of enterprise interactions.

Requirements

Demonstrated depth in building production data platforms that serve multiple consumption patterns – you've gone beyond traditional BI to support real-time product features, AI/ML workloads, or customer-facing analytics from the same data foundation.
Deep experience with the impedance mismatch between document-oriented operational stores and analytical systems – you've dealt with nested, schema-evolving source data (MongoDB, DynamoDB, or similar) and have opinions on where flattening and transformation should live.
Hands-on experience with data quality and trust at scale – you've built or operated schema registries, data contracts, quality monitoring, or lineage systems in an environment where multiple teams depend on shared data products.
Track record of cost-conscious data architecture – you've optimized Snowflake (or comparable) warehouse spend, designed compute governance policies, or re-architected pipelines to materially reduce cost without sacrificing reliability.
Strong instinct for the bridge role: you're as comfortable pushing back on an upstream team's schema change as you are negotiating freshness SLAs with a downstream AI consumer.
8+ years of professional software engineering experience, with significant time spent on distributed, data-intensive production systems – including substantial depth in data pipeline and platform architecture.
Deep hands-on expertise with modern data technologies: Snowflake, Apache Kafka, Apache Flink, and CDC tooling (Debezium or similar).
Experience developing and operating cloud data infrastructure at enterprise scale (AWS preferred), including infrastructure-as-code (Terraform) and CI/CD automation.
Strong programming skills in Python, Java, and SQL. You write production-grade code, not just scripts.
A track record of designing performant data models that support fast, efficient querying for analytical and product-facing use cases.
Strong cross-functional communication skills - you work effectively with software engineers, data scientists, AI teams, and business stakeholders across organizational boundaries.
Experience mentoring engineers and building collaborative, high-performing teams.

Responsibilities

Architect the data platform, driving the technical direction for a scalable, reliable data platform built on a medallion architecture that serves customer-facing analytics, reporting, and agentic AI from a unified foundation.
Build and optimize ingestion pipelines, designing robust CDC, real-time streaming (Kafka, Flink), and batch processing pipelines that transform complex, nested document-oriented operational data into clean analytical models at enterprise scale.
Build resilient ingestion and transformation layers that gracefully handle deeply nested, continuously evolving document schemas, deciding where to absorb complexity (ingestion, transformation, or query time) and making those tradeoffs explicit and sustainable.
Architect data products that support both traditional BI workloads (pre-aggregated dashboards, dimensional models for scorecards and reports) and emerging AI consumption patterns (low-latency retrieval, contextual assembly, freshness-sensitive agent queries).
Establish the data trust infrastructure that makes cross-team data consumption reliable: schema contracts with upstream producers, data quality monitoring, lineage tracking, freshness SLAs, and clear escalation paths when things break.
Own Snowflake warehouse optimization, compute governance, and cost-efficient pipeline design, building the practices and visibility so the team makes principled cost/performance tradeoffs.
Collaborate across organizational boundaries to align upstream software engineering teams and downstream analytics and AI teams around unified data strategies, shared contracts, and engineering standards.
Technically lead and growth-coach a diverse crew of data engineers, championing best practices across the full spectrum of data engineering disciplines, from low-level pipeline architecture to sophisticated data modeling and analytical query performance.