Staff Software Engineer (Data) - Rockerbox

DoubleVerify
120d$128,000 - $230,000

About The Position

We are looking for a Staff Data Engineer to shape the future of our data platform with a focus on small data at scale. While many companies over-index on heavyweight distributed systems, we believe in the power of efficient, local-first, columnar engines like DuckDB to process and analyze data quickly, reliably, and cost-effectively. As a Staff Data Engineer, you will set the technical direction for how our teams ingest, transform, and serve data, bridging the gap between lightweight embedded tools and cloud-scale systems. You’ll be hands-on in building pipelines, while also mentoring engineers and setting best practices across the organization.

Requirements

  • Deep expertise in SQL (window functions, CTEs, optimization).
  • Strong Python skills with data libraries.
  • Proficiency with DuckDB (extensions, parquet/iceberg integration, embedding in pipelines).
  • Hands-on with columnar formats (Parquet, Arrow, ORC) and schema evolution.
  • Expertise in Kubernetes and Helm.
  • Cloud storage experience (AWS S3, GCS).
  • Experience with semantic layer frameworks (CubeJS).
  • CI/CD tooling (GitHub Actions, Terraform, Docker/Kubernetes).
  • Track record of leading architecture decisions and mentoring teams.
  • Ability to set standards for maintainability and developer experience.

Nice To Haves

  • Experience with serverless and embedded analytics (DuckDB WASM, in production).
  • Exposure to data versioning (Delta Lake, Iceberg, Hudi).
  • Knowledge of ML/LLM data prep workflows and vector database integrations.
  • Previous experience building hybrid stacks (local development + cloud warehouse production).

Responsibilities

  • Architect and Build Data Pipelines
  • Design and implement data processing workflows using DuckDB, Polars, and Arrow/Parquet.
  • Balance small-data local pipelines with cloud data warehouse backends (Snowflake etc).
  • Champion the Small Data Mindset
  • Advocate for efficient, vectorized, local-first approaches where appropriate.
  • Drive best practices for designing reproducible and testable data workflows.
  • Collaborate Cross-Functionally
  • Partner with data science, professional services, and product engineering teams to define semantic data layers.
  • Provide technical leadership in how data is versioned, validated, and surfaced for downstream use.
  • Operational Excellence
  • Establish standards for CI/CD, observability, and reliability in data pipelines.
  • Automate workflows and optimize data layout for performance and cost efficiency.
  • Mentor & Lead
  • Serve as a thought leader in the organization, guiding engineers on when to use lightweight tools vs. distributed platforms.
  • Mentor senior and mid-level data engineers to accelerate their growth.

Benefits

  • Bonus/commission eligibility.
  • Equity options.
  • Comprehensive benefits package.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service