Principal Software Engineer- DataHub

HubSpot•Cambridge, MA

1d•Hybrid

About The Position

HubSpot’s Data Hub helps RevOps, marketing, sales, and customer teams turn fragmented data into actionable intelligence. We unify data across channels and tools, improve data quality, and activate it inside HubSpot so teams can run AI-powered demand generation, smarter campaigns and sales motions, agentic automations, trustworthy reporting, all without needing to be data experts. We’re a product engineering team at the intersection of data engineering, ML, applied AI, and go-to-market, and we care as much about reliability, cost, and scale as we do about time-to-value and usability for marketers and sales reps. We’re looking for a Principal Software Engineer to lead the next evolution of Data Hub as the backbone for data-driven demand generation. In this role, you’ll: Own core pieces of our data lake and analytics stack (e.g., Iceberg, Spark, batch and streaming pipelines) that power demand gen, segmentation, and scoring at scale. Design and evolve data systems that balance cost, latency, data freshness, and reliability, making explicit tradeoffs using concepts like CAP theorem, efficient partitioning, and storage layout. Partner closely with PM, product analytics, and GTM leaders to shape commercially meaningful solutions: better lead scoring, funnel visibility, audience building, and campaign attribution for marketers and sales. Help make Data Hub an AI‑agent‑forward platform, where curated, evergreen datasets automatically feed AI agents and reporting surfaces rather than requiring manual stitching or ad-hoc pipelines. Principal Engineers at HubSpot are expected to be hands-on builders, strong partners to product and design, and multipliers for the broader engineering organization.

Requirements

Deep experience building large‑scale data systems with Apache Spark and modern table formats like Apache Iceberg, including efficient partitioning, clustering, and file layout for both heavy ingestion and low‑latency reads.
Applies distributed systems principles and CAP theorem pragmatically to design fault‑tolerant, horizontally scalable services that balance availability, consistency, latency, and cost, where it matters.
Can turn ambiguous business goals into clear data models, contracts, and SLAs across multiple storage and compute layers (e.g., Iceberg, warehouses, logs, CRM stores).

Nice To Haves

If you enjoy working at the intersection of data engineering, ML, applied AI, and commercial outcomes, and you like building platforms that make complex data approachable for non-experts, we’d love to talk.

Responsibilities

Own core pieces of our data lake and analytics stack (e.g., Iceberg, Spark, batch and streaming pipelines) that power demand gen, segmentation, and scoring at scale.
Design and evolve data systems that balance cost, latency, data freshness, and reliability, making explicit tradeoffs using concepts like CAP theorem, efficient partitioning, and storage layout.
Partner closely with PM, product analytics, and GTM leaders to shape commercially meaningful solutions: better lead scoring, funnel visibility, audience building, and campaign attribution for marketers and sales.
Help make Data Hub an AI‑agent‑forward platform, where curated, evergreen datasets automatically feed AI agents and reporting surfaces rather than requiring manual stitching or ad-hoc pipelines.
Own platform-scale outcomes: Influence technical direction across the Data Hub product line and shape the architecture for unified profiles, segmentation, and datasets that other teams can build on.
Be a high-leverage, hands-on builder: Write code and build systems while leading end-to-end delivery of high-impact, multi-quarter initiatives, setting standards for reliability, observability, testing, and incident response.
Lead through architecture and influence: Define reusable patterns for ingestion, transformation, quality, sync, and observability, mentor senior engineers and tech leads.
Use AI code agents: Actively use AI-assisted development tools to speed iteration, reduce toil (e.g., scaffolding, tests, refactors), and improve code quality, while defining best practices with the human‑in‑the‑loop approach.
Champion incremental, outcome-focused delivery: Break down big, ambiguous problems into incremental milestones that deliver value early and often, balancing long-term platform bets with clear business impact (ARR, adoption, usage, efficiency).
Raise the bar on engineering practices: Model strong habits around documentation, design reviews, testing, and observability, and help establish reliability and data quality standards so downstream AI agents and data activation use cases can trust the data they receive.

Benefits

The cash compensation below includes base salary, on-target commission for employees in eligible roles, and annual bonus targets under HubSpot’s bonus plan for eligible roles.
In addition to cash compensation, some roles are eligible to participate in HubSpot’s equity plan to receive restricted stock units (RSUs).
Some roles may also be eligible for overtime pay.
Individual compensation packages are tailored to your skills, experience, qualifications, and other job-related reasons.
Benefits are also an important piece of your total compensation package.
Explore the benefits and perks HubSpot offers to help employees grow better.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume