Data Labeling Associate

Welocalize•San Francisco, CA

11d

About The Position

The ideal candidate will have a foundational understanding of machine learning, data annotation, quality assurance, and natural language processing. They will play a pivotal role in updating our machine learning models and ensuring their efficacy. This role primarily focuses on English US data sets; however, familiarity with translation or multi-lingual data sets can be a plus for future projects. Welocalize is a leading technology-enabled provider of translation, localization, and AI-driven content solutions, helping businesses communicate, innovate, and grow globally. Specializing in complex and regulated industries, Welocalize delivers precise, scalable multilingual content through a powerful combination of advanced AI technologies and expert human talent. At the core is Welocalize’s AI-enabled OPAL platform, which transforms translation workflows by integrating machine translation (MT) and large language models (LLMs) to provide fast, accurate, and culturally relevant content in over 300 languages. With a commitment to excellence, Welocalize holds 7 ISO certifications. Welocalize is headquartered in New York with offices all over the globe.

Requirements

Foundational understanding of machine learning, data annotation, quality assurance, and natural language processing.
Ability to work in a fast-paced, collaborative environment.
Excellent communication skills.
Familiarity with command-line tools and interfaces.
Strong analytical skills with the ability to identify patterns and anomalies.

Nice To Haves

Familiarity with translation or multi-lingual data sets can be a plus for future projects.

Responsibilities

Update training and test model databases with new or amended synthetic textual and image data.
Modify and refine machine learning data creation, annotation, and rating guidelines.
Initiate model training processes using internal tools and command-line interfaces.
Evaluate the performance of trained models to gauge their efficacy and readiness for deployment.
Design and develop test and training datasets as per the criteria provided by the project manager and other full-time employees.
Handle data efficiently, ensuring its integrity throughout the workflow.
Engage in data relevance tasks, ensuring data sets are aligned with project goals.
Annotate data accurately, ensuring it adheres to set guidelines.
Conduct manual quality analysis of model results.
Recognize error patterns and report anomalies for further investigation.
Deliver detailed reports on findings, including aspects such as utterance quality, LLM evaluation, ASR bug tracking, and customer pain points to be reviewed by the User Experience Research team.
Implement basic quality control measures and ensure the reliability of processed data.
Utilize intermediate data analysis techniques to extract insights and inform decision-making.
Arbitrate discrepancies effectively, ensuring consistent data quality.
Apply basic knowledge of natural language processing and linguistics to data processing tasks.
Ensure linguistic accuracy in all processed and annotated data.