Senior Data Engineer II

RELXRaleigh, NC
22h

About The Position

Are you an experienced developer with a ˜can do™ attitude and enthusiasm that inspires others? Do you enjoy being part of a team that works with a diverse range of technology? This position performs complex research and data engineering assignments within an engineering functional area or product line, and provides direct input to project plans, schedules, and methodology in the development of cross-functional products. This position performs date engineering design - typically across multiple systems; mentors more junior members of the team; and talks to users/customers and translates their requests into solutions.

Requirements

  • Bachelor’s or above in Computer Science, Software Engineering, Information Systems, Data Engineering or related.
  • 5+ years of experience in data / platform or backend engineering; practical ML or multimodal data project exposure.
  • Strong experience with data modeling, batch / streaming processing, distributed systems fundamentals.
  • Experienced with data cleaning & format transformation; multimodal sample construction & efficient storage.
  • Strong understanding multimodal training data patterns: balancing, segmentation, structural tagging, negative samples & quality metrics.
  • Experienced observability: integrated logs / metrics / tracing closed loop.
  • SQL, data warehousing, object storage, columnar & vector index structures.
  • Demonstrates robust Python experience (data processing, concurrency / async, performance profiling, packaging & environment isolation).
  • Linux CLI & bash scripting: files / permissions / processes, network & IO diagnostics, automation and troubleshooting.

Nice To Haves

  • Experience with cloud-based data platforms (e.g., AWS, GCP, Azure) for large-scale machine learning workflows.
  • Familiarity with MLOps tools and practices (e.g., MLflow, Kubeflow, Airflow) for

Responsibilities

  • Pipelines & preprocessing: scalable cleaning, OCR / layout normalization, early quality gating.
  • Labeling + active learning loop: strategic sampling, quality scoring, continuous feedback integration.
  • Training & inference engineering: sample automation, feature generation, resource orchestration, reliability & monitoring.
  • Serving & optimization: multi‑model routing, caching / indexing, elastic scaling, performance & cost efficiency.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service