About The Position

Innodata (NASDAQ: INOD) is a leading data engineering company. With more than 2,000 customers and operations in 13 cities around the world, we are the AI technology solutions provider-of-choice to 4 out of 5 of the world’s biggest technology companies, as well as leading companies across financial services, insurance, technology, law, and medicine. By combining advanced machine learning and artificial intelligence (ML/AI) technologies, a global workforce of subject matter experts, and a high-security infrastructure, we’re helping usher in the promise of clean and optimized digital data to all industries. Innodata offers a powerful combination of both digital data solutions and easy-to-use, high-quality platforms. Our global workforce includes over 3,000 employees in the United States, Canada, United Kingdom, the Philippines, India, Sri Lanka, Israel and Germany. We’re poised for a period of explosive growth over the next few years.

Requirements

  • Advanced proficiency in Python for backend and large-scale data processing
  • Strong experience building and managing big data pipelines in production environments
  • Hands-on expertise with workflow orchestration tools such as Airflow or Google Cloud Composer
  • Proven experience in batch and streaming data processing using: Apache Spark Apache Beam (Dataflow)
  • Experience designing and operating event-driven systems using Pub/Sub
  • Strong understanding of distributed systems architecture and scalability patterns
  • Experience managing globally distributed, low-latency datasets
  • Hands-on experience with NoSQL databases and/or Google Cloud Spanner
  • Strong knowledge of system reliability, fault tolerance, and performance optimization

Nice To Haves

  • Proficiency in Go, Java, or Scala
  • Experience with Kafka or Flume for streaming ingestion
  • Deep familiarity with the Google Cloud Platform ecosystem
  • Experience with production monitoring, logging, and observability frameworks
  • Exposure to high-availability, multi-region deployments

Responsibilities

  • Design, build, and optimize scalable data pipelines for batch and real-time processing
  • Develop and maintain event-driven architectures for high-throughput systems
  • Ensure data reliability, performance, and low-latency processing across distributed environments
  • Collaborate with data scientists and application teams to enable analytics and AI use cases
  • Implement best practices in performance tuning, monitoring, and cost optimization
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service