Computational Linguist

Apple•Cupertino, CA

About The Position

Join Apple's Entity and App Resolution team, where we build the intelligent systems that help customers find and interact with exactly what they're looking for. Our work powers personalized search and resolution capabilities that improve user experiences. We're seeking a Computational Linguist to support our entity resolution systems through data-driven work. In this role, you'll help construct, generate, annotate, maintain, and analyze the data that power our products. You'll help build processes and pipelines to streamline data workflows. You will also be responsible for documenting and communicating processes, guidelines, and findings. You'll work across engineering, quality, localization, and UX teams, applying your linguistic expertise to ensure our systems work well across different contexts and locales. Your systematic and organized approaches will help to increase efficiency within and across teams. DESCRIPTION Data Generation & Management: Design, generate, and maintain high-quality datasets for entity resolution testing and evaluation; automate data generation pipelines where possible Annotation Projects: Plan, manage, and execute annotation projects; create annotation guidelines, coordinate with annotators, ensure quality and consistency Data Analysis: Clean, process, and analyze language data using linguistic expertise to identify patterns, edge cases, and system failures; surface insights that drive product improvements Cross-Functional Collaboration: Work closely with Quality Engineering, UX researchers, engineers, and localization teams to ensure entity resolution meets quality standards and user needs Localization Awareness: Work with internationalization to identify and address language-specific and cultural issues in entity resolution across different locales Testing & Evaluation: Design systematic test cases and evaluation frameworks; analyze test results to identify linguistic patterns in failures Documentation: Create clear, comprehensive documentation of scope, datasets, processes, findings, and recommendations for diverse audiences

Requirements

Master's degree in Linguistics, Computational Linguistics, or related field
Strong data skills: cleaning, processing, analyzing, and finding patterns in language data
Strong analytical skills: Experience with quantitative and qualitative data analyses
Coding skills (e.g. Python, R) for the purpose of data processing and analysis
3-5 years of industry experience
Experience with linguistic annotation
Excellent communication skills and ability to work cross-functionally
Strong organizational skills and documentation practices
Systematic thinking and attention to detail
Basic understanding of machine learning concepts

Nice To Haves

Familiarity with version control (Git or similar)
Familiarity with conversational systems, NLP, or search systems
Experience managing or participating in annotation projects
Experience with localization or multilingual issues in language technology
Project management experience (academic or professional)
Experience working with engineers or technical teams

Responsibilities

Design, generate, and maintain high-quality datasets for entity resolution testing and evaluation
Automate data generation pipelines where possible
Plan, manage, and execute annotation projects
Create annotation guidelines, coordinate with annotators, ensure quality and consistency
Clean, process, and analyze language data using linguistic expertise to identify patterns, edge cases, and system failures
Surface insights that drive product improvements
Work closely with Quality Engineering, UX researchers, engineers, and localization teams to ensure entity resolution meets quality standards and user needs
Work with internationalization to identify and address language-specific and cultural issues in entity resolution across different locales
Design systematic test cases and evaluation frameworks
Analyze test results to identify linguistic patterns in failures
Create clear, comprehensive documentation of scope, datasets, processes, findings, and recommendations for diverse audiences