Senior Software Engineer, Data Ingestion Platform

Block

10h•Remote

About The Position

The Data Ingestion team is part of Block's AI, Data & Analytics organization and is responsible for building and operating the platforms that replicate and ingest data into Block's Lakehouse, powered by Databricks and Snowflake. The team owns Block's Change Data Capture (CDC) platform, streaming data connectors, and data loading infrastructure — ensuring that fresh, reliable data from production databases, event streams, and third-party sources is available for analytics, machine learning, and AI initiatives across Square, Cash App, and Afterpay. As a Senior Software Engineer on the team, you will design and build the next generation of data ingestion infrastructure — including Kafka Iceberg connectors, database replication pipelines, and unified ingestion frameworks. You will drive the modernization of our CDC platform, help consolidate multiple ingestion paths into a cohesive architecture, and collaborate with partner teams across Block to ensure data flows reliably from source to Lakehouse. In this role, you will have a direct impact on the scalability, reliability, and cost-efficiency of Block's data ecosystem. This role can be performed from any location in the US or Canada.

Requirements

8+ years of experience in software engineering or data platform development, with a focus on building scalable data systems or distributed infrastructure.
Strong programming proficiency in languages such as Java, Python, Scala, or Go, with experience developing data frameworks, libraries, or services.
Hands-on experience with streaming data systems and technologies such as Apache Kafka, Kafka Connect, or similar distributed messaging platforms.
Solid understanding of Change Data Capture (CDC), database replication patterns, and data lake or Lakehouse architectures.
Experience with modern data storage formats and table formats such as Apache Iceberg or Delta Lake.
Experience with cloud-based data ecosystems (AWS, GCP, or Azure) and infrastructure-as-code tools.

Responsibilities

Design, build, and operate scalable data replication and ingestion pipelines that move data from production databases, event streams, and third-party sources into Block's Lakehouse.
Develop and enhance Kafka Iceberg connectors and data loading frameworks, enabling reliable, low-latency data delivery to Snowflake and Databricks.
Drive the modernization of Block's CDC platform — evaluating and implementing next-generation approaches for database replication, including cloud-native alternatives, and Iceberg-based ingestion patterns.
Build self-service tooling and observability features that empower internal teams to onboard, monitor, and troubleshoot their own data pipelines with minimal support.
Collaborate with data engineering, platform infrastructure, and product teams to define data contracts, improve service encapsulation, and reduce tight coupling between operational databases and analytics consumers.
Contribute to the unification of Block's data ingestion architecture by identifying opportunities to consolidate overlapping systems and reduce infrastructure complexity.
Design and implement solutions for PII detection, masking, and privacy-compliant data handling within ingestion pipelines, ensuring sensitive data is properly classified, protected, and governed in accordance with Block's privacy policies and regulatory requirements (e.g., GDPR, CCPA).
Establish and promote best practices for data pipeline reliability, cost optimization, schema management, and compliance across the ingestion platform.