About The Position

Speechify is seeking a skilled Software Engineer to join the Data side of their AI team. This role is responsible for all aspects of data collection to support model training operations. The team is capable of building high-quality datasets at petabyte-scale and low cost through a tight integration of infrastructure, engineering, and research work. The engineer will help find new sources of audio data, bring it into the ingestion pipeline, and operate and extend the cloud infrastructure for this pipeline, which currently runs on GCP and is managed with Terraform. Collaboration with Scientists and AI Team leadership is key to defining the dataset roadmap and improving the cost/throughput/quality frontier for next-generation models and products.

Requirements

  • BS/MS/PhD in Computer Science or a related field.
  • 5+ years of industry experience in software development.
  • Proficiency with bash/Python scripting in Linux environments
  • Proficiency in Docker and Infrastructure-as-Code concepts and professional experience with at least one major Cloud Provider (we use GCP)
  • Ability to handle multiple tasks and adapt to changing priorities.
  • Strong communication skills, both written and verbal.

Nice To Haves

  • Experience with web crawlers, large-scale data processing workflows is a plus

Responsibilities

  • Be scrappy to find new sources of audio data and bring it into our ingestion pipeline
  • Operate and extend the cloud infrastructure for our ingestion pipeline, currently running on GCP and managed with Terraform.
  • Collaborate closely with our Scientists to shift the cost/throughput/quality frontier, delivering richer data at bigger scale and lower cost to power our next-generation models.
  • Collaborate with others on the AI Team and Speechify Leadership to craft the AI Team’s dataset roadmap to power Speechify’s next-generation consumer and enterprise products.

Benefits

  • Competitive salaries
  • bonus
  • equity
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service