Senior Data Engineer

Marble Technologies•Lincoln, NE

48d

About The Position

Marble is a technology company founded to revolutionize the food processing industry. Marble is seeking a full-time Senior Data Engineer who is ready for a challenge and eager to design, implement, and support automation solutions that are transforming the industry. As a part of the Marble team, you will leverage cutting-edge technologies to develop the next generation of automated solutions for food processing, enhancing resilience in the food supply chain. JOB SUMMARY: As a Senior Data Engineer at Marble, you will own the design and performance of data pipelines that power everything from real-time classification dashboards to ML training datasets to operational analytics for production facilities. You will work closely with Software, Infrastructure, and Machine Learning teams to ensure data flows efficiently through our pipelines securely, reliably, and at scale. You will design for both high-throughput real-time ingestion and large-scale batch processing across on-prem edge nodes and AWS.

Requirements

B.S. or M.S. in Computer Science, Data Engineering, or related field
4+ years of experience building production-grade data pipelines or distributed systems
Strong proficiency in Python and SQL
Production experience with at least one major distributed compute framework, Apache Spark, Ray, or Apache Airflow (2+ years preferred)
Experience building streaming pipelines or real-time systems (Kafka, NATS, Redis Streams, or similar)
Deep familiarity with AWS cloud services (S3, Lambda, IAM, EC2, Glue etc.)
Experience with PostgreSQL, MongoDB, Clickhouse or other columnar/NoSQL systems
Strong understanding of data modeling, partitioning, schema evolution, and performance tuning
Understanding of data quality, lineage, orchestration, and governance
Ability to design systems in hybrid environments (on-prem + cloud)
Excellent communication, documentation, and teamwork skills

Nice To Haves

Experience with NATS JetStream, Kafka, or high-throughput messaging systems
Familiarity with GPU-based CV pipelines, ML datasets, or annotation workflows
Experience with ClickHouse Materialized Views, Replicated Tables, or S3-backed storage
Experience working in a regulated, safety-critical, or high-uptime environment
Experience with Nomad, Consul, Vault, or HashiCorp ecosystem

Responsibilities

Architect and build scalable ETL/ELT pipelines for both batch and streaming workloads
Design real-time ingestion and transformation workflows integrating NATS JetStream and distributed microservices
Develop robust data models and ETL layers for ClickHouse, enabling high-performance analytics and ML feature extraction
Manage and optimize data storage across AWS S3, ClickHouse, and operational datasets generated on-prem
Build automation workflows for labeling data, CV pipeline pre-annotation, dataset generation, and versioning
Ensure data quality, validation, integrity, and lineage, including automated tests and monitoring across pipelines
Collaborate with ML and backend teams to deliver pipelines for training datasets and annotation tools.
Implement scalable compute workloads for large dataset transformations
Define and enforce data governance best practices, including schema evolution, retention policies, and compliance requirements
Monitor and improve data pipeline performance across multi-region environments