Senior Data Engineer

Marble TechnologiesLincoln, NE

About The Position

Marble is a technology company founded to revolutionize the food processing industry. Marble is seeking a full-time Senior Data Engineer who is ready for a challenge and eager to design, implement, and support automation solutions that are transforming the industry. As a part of the Marble team, you will leverage cutting-edge technologies to develop the next generation of automated solutions for food processing, enhancing resilience in the food supply chain. JOB SUMMARY: As a Senior Data Engineer at Marble, you will own the design and performance of data pipelines that power everything from real-time classification dashboards to ML training datasets to operational analytics for production facilities. You will work closely with Software, Infrastructure, and Machine Learning teams to ensure data flows efficiently through our pipelines securely, reliably, and at scale. You will design for both high-throughput real-time ingestion and large-scale batch processing across on-prem edge nodes and AWS.

Requirements

  • B.S. or M.S. in Computer Science, Data Engineering, or related field
  • 4+ years of experience building production-grade data pipelines or distributed systems
  • Strong proficiency in Python and SQL
  • Production experience with at least one major distributed compute framework, Apache Spark, Ray, or Apache Airflow (2+ years preferred)
  • Experience building streaming pipelines or real-time systems (Kafka, NATS, Redis Streams, or similar)
  • Deep familiarity with AWS cloud services (S3, Lambda, IAM, EC2, Glue etc.)
  • Experience with PostgreSQL, MongoDB, Clickhouse or other columnar/NoSQL systems
  • Strong understanding of data modeling, partitioning, schema evolution, and performance tuning
  • Understanding of data quality, lineage, orchestration, and governance
  • Ability to design systems in hybrid environments (on-prem + cloud)
  • Excellent communication, documentation, and teamwork skills

Nice To Haves

  • Experience with NATS JetStream, Kafka, or high-throughput messaging systems
  • Familiarity with GPU-based CV pipelines, ML datasets, or annotation workflows
  • Experience with ClickHouse Materialized Views, Replicated Tables, or S3-backed storage
  • Experience working in a regulated, safety-critical, or high-uptime environment
  • Experience with Nomad, Consul, Vault, or HashiCorp ecosystem

Responsibilities

  • Architect and build scalable ETL/ELT pipelines for both batch and streaming workloads
  • Design real-time ingestion and transformation workflows integrating NATS JetStream and distributed microservices
  • Develop robust data models and ETL layers for ClickHouse, enabling high-performance analytics and ML feature extraction
  • Manage and optimize data storage across AWS S3, ClickHouse, and operational datasets generated on-prem
  • Build automation workflows for labeling data, CV pipeline pre-annotation, dataset generation, and versioning
  • Ensure data quality, validation, integrity, and lineage, including automated tests and monitoring across pipelines
  • Collaborate with ML and backend teams to deliver pipelines for training datasets and annotation tools.
  • Implement scalable compute workloads for large dataset transformations
  • Define and enforce data governance best practices, including schema evolution, retention policies, and compliance requirements
  • Monitor and improve data pipeline performance across multi-region environments
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service