AI Data Infrastructure Engineer

Bright Vision TechnologiesNew York, NY
Remote

About The Position

Bright Vision Technologies is a forward-thinking software development company dedicated to building innovative solutions that help businesses automate and optimize their operations. We leverage cutting-edge technologies to create scalable, secure, and user-friendly applications. As we continue to grow, we’re looking for a skilled AI Data Infrastructure Engineer to join our dynamic team and contribute to our mission of transforming business processes through technology. This is a fantastic opportunity to join an established and well-respected organization offering tremendous career growth potential.

Requirements

  • Bachelor’s or Master’s degree in Computer Science or a related field.
  • Six or more years of data engineering experience, with significant work supporting ML or AI workloads.
  • Strong proficiency in Python and at least one JVM or systems language.
  • Deep experience with modern data processing frameworks such as Spark, Ray, or Beam.
  • Hands-on experience operating petabyte-scale storage and pipeline systems.
  • Strong understanding of distributed systems, data modeling, and storage formats.
  • Experience with dataset versioning, lineage, and reproducibility for ML workflows.
  • Familiarity with high-throughput data loading for accelerator-based training.
  • Strong software engineering practices including testing, CI/CD, and code review.
  • Excellent communication and cross-functional collaboration skills.

Nice To Haves

  • Experience with multimodal datasets at large scale.
  • Familiarity with data quality tooling and dataset evaluation methodology.
  • Exposure to privacy-preserving data systems and regulated data handling.
  • Open-source contributions to data infrastructure projects.
  • Experience supporting frontier model training pipelines.

Responsibilities

  • Design and operate large-scale data pipelines supporting AI training, evaluation, and continual improvement workflows.
  • Build ingestion systems for diverse modalities including text, image, audio, video, and structured signals.
  • Implement data cleaning, deduplication, filtering, and quality assurance at petabyte scale.
  • Develop dataset versioning, lineage, and provenance tracking systems suitable for reproducible training.
  • Build high-throughput data loading systems that maximize GPU utilization during training.
  • Implement labeling workflows, active learning pipelines, and human-in-the-loop data improvement systems.
  • Design storage architectures balancing cost, throughput, and latency across data tiers.
  • Build evaluation dataset construction pipelines with strict integrity and contamination controls.
  • Implement data privacy, redaction, and consent enforcement throughout the pipeline.
  • Collaborate with ML researchers and engineers to align data systems with model development needs.
  • Drive observability of data quality, drift, and pipeline health across the AI data estate.
  • Optimize cost and performance through compression, format selection, and caching strategies.
  • Document data systems, schemas, and operational procedures for broad internal use.
  • Stay current with AI data infrastructure research and emerging open-source tools.

Benefits

  • Competitive base salary commensurate with experience, plus benefits.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service