Language Engineer, Artificial General Intelligence - Data Services

Amazon•Bellevue, WA

18h

About The Position

The Amazon Artificial General Intelligence (AGI) Data Services organization is looking for a Language Engineer with experience in dataset construction, linguistic annotation, dialog/semantic schemas, and automatic processing of large datasets. You will play a critical role in driving innovation and advancing the state-of-the-art in natural language processing and machine learning. You will work closely with cross-functional teams, including product managers, engineers, and data scientists to ensure that our AI systems are aligned with human policies and preferences. Key job responsibilities Specifically, the Language Engineer will: Design data collection/creation tasks in response to science needs: author instructions, define and implement quality targets and mechanisms, provide day-to-day coordination of data collection efforts (including planning, scheduling, and reporting), and be responsible for the final deliverables Analyze and extract language-related insights from large amounts of data Build tools or tool prototypes for data analysis or data authoring, using Python or another scripting language Use modeling tools to bootstrap or test new functionalities Collaborate with scientists and software engineers to evaluate performance of language models Handle competing requests from a range of data customers

Requirements

Experience owning and executing language data collection projects, including guidelines, labelset and annotation workflow development
Master's or higher degree in a relevant field (Computational Linguistics or equivalent field with computational analysis)
2+ years experience in computational linguistics or language data processing or AI data creation
Experience with language data annotation systems and other forms of data markup
Proficient with scripting languages, such as Python
Experience working with speech, text, and multimodal data in multiple languages
Excellent communication, strong organizational skills and very detailed oriented
Comfortable working in a fast paced, highly collaborative, dynamic work environment

Nice To Haves

PhD in Computational Linguistics (or equivalent field with computational emphasis)
Expertise in bootstrapping AI data collections for quickly evolving requirements
Extensive experience working with speech, text, and multimodal data in multiple languages
Experience in data creation for complex agentic workflows
Practical experience with Machine Learning and technical concepts such as API
Practical knowledge of version control and agile development; familiarity with database queries and data analysis processes (SQL, R, Matlab, etc.)

Responsibilities

Design data collection/creation tasks in response to science needs: author instructions, define and implement quality targets and mechanisms, provide day-to-day coordination of data collection efforts (including planning, scheduling, and reporting), and be responsible for the final deliverables
Analyze and extract language-related insights from large amounts of data
Build tools or tool prototypes for data analysis or data authoring, using Python or another scripting language
Use modeling tools to bootstrap or test new functionalities
Collaborate with scientists and software engineers to evaluate performance of language models
Handle competing requests from a range of data customers

Benefits

health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
401(k) matching
paid time off
parental leave

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume