Senior Data Engineer II

RemitlyRaleigh, NC
2d

About The Position

Are you an experienced developer with a ˜can do™ attitude and enthusiasm that inspires others? Do you enjoy being part of a team that works with a diverse range of technology? Join our team to help build state-of-the-art research tools. Our Data Science teams focus on extracting key information such as entities mentioned, sentiment analysis, data enrichments, predictive insights, and more to build best in class data and news streams relied on by our global customer base. This role leads multimodal model strategy (vision + language + layout) and multi‑agent collaboration (task decomposition, verification, conflict reconciliation, feedback loops) and plans future customized training and ongoing optimization of models.

Requirements

  • Bachelor’s or above in Computer Science, Software Engineering, Information Systems, Data Engineering or related.
  • 3–6 years in data / platform or backend engineering; practical ML or multimodal data project exposure.
  • Strong experience with data modeling, batch / streaming processing, distributed systems fundamentals.
  • Experienced with data cleaning & format transformation; multimodal sample construction & efficient storage.
  • Strong understanding multimodal training data patterns: balancing, segmentation, structural tagging, negative samples & quality metrics.
  • Experienced observability: integrated logs / metrics / tracing closed loop.
  • SQL, data warehousing, object storage, columnar & vector index structures.
  • Demonstrates robust Python experience (data processing, concurrency / async, performance profiling, packaging & environment isolation).
  • Linux CLI & bash scripting: files / permissions / processes, network & IO diagnostics, automation and troubleshooting.

Nice To Haves

  • Experience with cloud-based data platforms (e.g., AWS, GCP, Azure) for large-scale machine learning workflows.
  • Familiarity with MLOps tools and practices (e.g., MLflow, Kubeflow, Airflow)

Responsibilities

  • Pipelines & preprocessing: scalable cleaning, OCR / layout normalization, early quality gating.
  • Labeling + active learning loop: strategic sampling, quality scoring, continuous feedback integration.
  • Training & inference engineering: sample automation, feature generation, resource orchestration, reliability & monitoring.
  • Serving & optimization: multi‑model routing, caching / indexing, elastic scaling, performance & cost efficiency.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service