Data Engineer, Applied Machine Learning
WellSaid Labs
·
Posted:
August 25, 2023
·
Remote
About the position
This job overview is for the role of Data Engineer on the Applied Machine Learning Team at WellSaid Labs. The team works on improving and maintaining ML solutions, with a focus on the text-to-speech service. As a Data Engineer, you will be responsible for improving ML services, including testing frameworks, customer research, and model improvements. You will also work on gathering and organizing datasets, training and deploying ML models, and evaluating their performance. Additionally, this role requires coding skills and a critical understanding of language and audio dynamics to build high-quality voices and Studio/API experiences.
Responsibilities
- Improve and maintain ML solutions, including creating test datasets and metrics, prioritizing model updates, training new models, coordinating releases, and educating customers on new capabilities.
- Own the strategy and development of testing frameworks, customer research, and model improvements for the text-to-speech service.
- Add new datasets, train, deploy, and evaluate new models, and design experiments and algorithms for solving TTS challenges.
- Build automated systems for evaluating ML performance, such as accuracy, consistency, and customer acceptance.
- Summarize findings into compelling reports and collaborate with the Platform and Applied ML teams to build solutions.
- Familiarity with Text-to-Speech technology, database querying, data labeling and preparing, crafting workflows for crowd-sourcing evaluations, and metrics reporting.
- Work directly with text and audio data, gathering, compiling, and organizing datasets, preparing data for machine training, evaluating results, and debugging problematic data.
- Train and deploy ML models, incorporating new data, monitoring training metrics, debugging failing code, and deploying models for customer use.
- Evaluate ML models, consider causation or correlation between training data and ML predictions, design ML experiments, gather and evaluate metrics, and design evaluation tools for measuring pronunciation accuracy, naturalness, and text normalization coverage.
- Additional research projects, such as interesting data or use cases, alternative services and solutions, internal process improvements, and new quality evaluation exercises.
- Write and execute code to enable performing tasks and think critically about language, dialect, pronunciation, phonemics, and audio dynamics to build high-quality voices and Studio/API experiences.
- Experience with ML concepts and best practices, managing datasets and metrics, developing tools for data evaluation and analysis, and software releases with considerations for customer impact and ethical implementations of AI.
- Manage project expectations, communicate plans, project statuses, and results, work with various data types, build analysis tools, establish success criteria for data-driven projects, and deploy ML models for non-technical audiences.
- Build and document new processes, especially in ML pipelines, understand the importance of data preparation, data visualization, and metrics for ML assessment, and analyze ML results.
- Familiarity with software and feature releases, work closely with Product teams, and fluency in Spanish or French (bonus).
- Curiosity and interest in linguistics and acoustics (bonus).
- Study of Deep Learning and application of models to solve technical challenges (bonus).
Requirements
- Experienced Data Engineer working in Applied Machine Learning
- Familiarity with Text-to-Speech technology
- Experience with database querying, data labeling, and data preparation
- Ability to craft workflows for crowd-sourcing evaluations
- Proficiency in metrics reporting
- Strong understanding of ML concepts and best practices
- History of managing datasets and metrics in a ML capacity
- Coding experience developing tools for data evaluation and analysis
- Experience with software releases and considerations for customer impact and ethical implementations of AI
- Ability to manage project expectations and communicate plans, statuses, and results
- Experience with a wide array of data types and establishing success criteria for data-driven projects
- Ability to build and deploy ML models for non-technical audience
- Strong understanding of data preparation, data visualization, and ML assessment
- Familiarity with software and feature releases, working closely with Product team
- Fluent in Spanish or French (bonus)
- Curiosity and interest in linguistics and acoustics (bonus)
- Knowledge of Deep Learning and application of models to solve technical challenges (bonus)
Benefits
- Competitive salary and stock options
- Full medical, dental, and vision insurance
- Matching 401(k) plan
- Generous vacation policy/paid time off
- Parental leave
- Learning & development stipend
- Home office stipend