Data Engineer, Applied Machine Learning

Apply

WellSaid Labs

Posted:

August 25, 2023

Remote

Job Commitment

Full-time

Experience Level

Mid Level

Workplace Type

Remote

Job Function

Dev & Engineering

This job is closed

We regret to inform you that the job you were interested in has now been closed. Although this specific position is no longer available, we encourage you to continue exploring other opportunities on our job board.

About the position

This job overview is for the role of Data Engineer on the Applied Machine Learning Team at WellSaid Labs. The team works on improving and maintaining ML solutions, with a focus on the text-to-speech service. As a Data Engineer, you will be responsible for improving ML services, including testing frameworks, customer research, and model improvements. You will also work on gathering and organizing datasets, training and deploying ML models, and evaluating their performance. Additionally, this role requires coding skills and a critical understanding of language and audio dynamics to build high-quality voices and Studio/API experiences.

Responsibilities

Improve and maintain ML solutions, including creating test datasets and metrics, prioritizing model updates, training new models, coordinating releases, and educating customers on new capabilities.
Own the strategy and development of testing frameworks, customer research, and model improvements for the text-to-speech service.
Add new datasets, train, deploy, and evaluate new models, and design experiments and algorithms for solving TTS challenges.
Build automated systems for evaluating ML performance, such as accuracy, consistency, and customer acceptance.
Summarize findings into compelling reports and collaborate with the Platform and Applied ML teams to build solutions.
Familiarity with Text-to-Speech technology, database querying, data labeling and preparing, crafting workflows for crowd-sourcing evaluations, and metrics reporting.
Work directly with text and audio data, gathering, compiling, and organizing datasets, preparing data for machine training, evaluating results, and debugging problematic data.
Train and deploy ML models, incorporating new data, monitoring training metrics, debugging failing code, and deploying models for customer use.
Evaluate ML models, consider causation or correlation between training data and ML predictions, design ML experiments, gather and evaluate metrics, and design evaluation tools for measuring pronunciation accuracy, naturalness, and text normalization coverage.
Additional research projects, such as interesting data or use cases, alternative services and solutions, internal process improvements, and new quality evaluation exercises.
Write and execute code to enable performing tasks and think critically about language, dialect, pronunciation, phonemics, and audio dynamics to build high-quality voices and Studio/API experiences.
Experience with ML concepts and best practices, managing datasets and metrics, developing tools for data evaluation and analysis, and software releases with considerations for customer impact and ethical implementations of AI.
Manage project expectations, communicate plans, project statuses, and results, work with various data types, build analysis tools, establish success criteria for data-driven projects, and deploy ML models for non-technical audiences.
Build and document new processes, especially in ML pipelines, understand the importance of data preparation, data visualization, and metrics for ML assessment, and analyze ML results.
Familiarity with software and feature releases, work closely with Product teams, and fluency in Spanish or French (bonus).
Curiosity and interest in linguistics and acoustics (bonus).
Study of Deep Learning and application of models to solve technical challenges (bonus).

Requirements

Experienced Data Engineer working in Applied Machine Learning
Familiarity with Text-to-Speech technology
Experience with database querying, data labeling, and data preparation
Ability to craft workflows for crowd-sourcing evaluations
Proficiency in metrics reporting
Strong understanding of ML concepts and best practices
History of managing datasets and metrics in a ML capacity
Coding experience developing tools for data evaluation and analysis
Experience with software releases and considerations for customer impact and ethical implementations of AI
Ability to manage project expectations and communicate plans, statuses, and results
Experience with a wide array of data types and establishing success criteria for data-driven projects
Ability to build and deploy ML models for non-technical audience
Strong understanding of data preparation, data visualization, and ML assessment
Familiarity with software and feature releases, working closely with Product team
Fluent in Spanish or French (bonus)
Curiosity and interest in linguistics and acoustics (bonus)
Knowledge of Deep Learning and application of models to solve technical challenges (bonus)