Senior Data Acquisition Engineer

Bluefish AINew York, NY
Hybrid

About The Position

As a Senior Data Acquisition Engineer, you will own and evolve the systems responsible for large-scale web data collection. You will design and maintain production-grade scraping and ingestion infrastructure that enables multiple teams to reliably add and operate data sources. This role sits within the Data Engineering team and focuses on building scalable, observable, and resilient systems that handle production data. You will work closely with data engineers and product teams to level up data pipelines and ensure data acquisition scales with the business. This role presents an exciting opportunity to shape the future of AI‑driven technologies and make meaningful contributions to real‑world applications. This role is based in our NYC office and follows a hybrid working policy.

Requirements

  • At least 5 years of experience building and operating production-grade backend, data, or platform systems, with hands-on experience in web data acquisition or scraping.
  • Operated systems in environments with frequent change, partial failure, and external constraints.
  • Product-oriented engineering mindset. Strong problem-solving skills and ownership mindset in production environments.
  • Solid understanding of networking fundamentals: TLS/SSL behavior, timeouts, and failure modes.
  • Hands-on experience with browser automation tools.
  • Proficiency using CSS selectors and XPath for data extraction.
  • Experience designing systems with robust retry logic, error handling, testing, and observability.
  • Proficiency with SQL and NoSQL databases
  • Experience designing and operating production systems on AWS, including VPC networking, service-to-service communication, monitoring, and on-call operations.

Nice To Haves

  • Experience building or maintaining browser extensions.
  • Exposure to browser internals or browser source code (e.g. debugging, patching, or extending browser behavior
  • Background in large-scale crawling, scraping, or distributed data systems.

Responsibilities

  • Build a scalable web data acquisition platform used across teams.
  • Enable not just your team but other teams to ingest data more safely and efficiently.
  • Make tradeoffs between costs, reliability and performance in key architecture decisions
  • Create shared abstractions and tooling that make it easy to add, maintain, and operate scrapers in production.
  • Own system reliability, including smart retries, backoff strategies, error handling, and failure recovery.
  • Continuously improve resilience, correctness and maintainability of scraping and data ingestion infrastructure.
  • Build observability into data pipelines through logging, metrics, and alerting.
  • Monitor data quality.
  • Improve and scale end-to-end data pipelines.
  • Collaborate cross-functionally to support new data sources and evolving product requirements.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service