Market Data Engineer

BHFT
Remote

About The Position

The Data Engineering team is responsible for designing, building, and maintaining the Market Data Platform — a lakehouse infrastructure spanning the full path from raw exchange feeds to reliable, petabyte-scale data for research, backtesting, and real-time trading.

Requirements

  • 5+ years building production-grade data systems, with proven expertise architecting and launching data lakes / lakehouses from scratch.
  • Hands-on experience with Apache Iceberg (or comparable table formats — Delta / Hudi): partitioning, schema evolution, snapshots, compaction, and catalog operations; familiarity with Apache Arrow for zero-copy, columnar in-memory interchange.
  • Experience with market data and/or network packet capture — decoding pcap, exchange feed protocols (ITCH, FIX/FAST, multicast UDP), order-book reconstruction, and time-series at scale (strong plus; willingness to learn required).
  • Experience normalizing market data from multiple vendors — e.g. OneTick, Refinitiv/Reuters, Bloomberg, ICE — into a unified schema and symbology (strong plus).
  • Expert-level Python (incl. Polars and/or PySpark); Rust a strong plus (relevant for high-performance capture/decoding).
  • Modern orchestration (Airflow) and distributed processing (Apache Spark).
  • Advanced SQL: complex aggregations, window functions, query optimization, partition pruning.
  • Solid fundamentals in Linux, containerization (Docker, Kubernetes / EKS), and cloud object storage (AWS S3).
  • DevOps & observability: CI/CD, infrastructure-as-code (Terraform), GitOps (ArgoCD), and metrics/dashboards/alerting (Grafana, Prometheus).
  • Strong grasp of structured + unstructured / binary data, and storage optimization — partitioning, compression, cost management.
  • English fluency for documentation and collaboration in an international team.

Responsibilities

  • Own the full capture path from wire to lake: decode and normalize raw exchange feeds (pcap, multicast UDP / ITCH / FIX) and vendor sources (OneTick, Refinitiv, Bloomberg, ICE) into a unified canonical model with nanosecond timestamps.
  • Build batch + stream pipelines (Airflow, Spark, dbt) for tick and reference data.
  • Own L2/L3 order-book reconstruction with gap handling.
  • Provide Python and Rust producer SDKs for internal feed handlers.
  • Own the Iceberg-over-S3 lakehouse: design partitioning, sort orders, and row-group layout for fast scans; manage schema evolution, snapshots, time travel, compaction, and TTL.
  • Maintain reference data as slowly-changing tables with point-in-time correctness for backtests.
  • Drive storage cost optimisation via compaction, tiering, and snapshot expiry.
  • Build libraries for schema management, data contracts, validation, and lineage on top of the Iceberg catalog.
  • Develop shared access services (Spark + Polars) so Research, backtesting, and trading share one normalized data layer, including gap detection and pcap-vs-lake reconciliation.
  • Embed monitoring, alerting, SLAs/SLOs, and CI/CD across capture and pipeline layers on Kubernetes (EKS).
  • Own data-quality dashboards and incident runbooks for the capture fleet.
  • Partner with Quant Research, Data Science, Backend, and DevOps to translate requirements into platform capabilities and champion market-data engineering best practices.

Benefits

  • Compensation for health insurance
  • Compensation for sports
  • Compensation for professional development
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service