Ontologist II - AMZ20452.4

Amazon•Santa Barbara, CA

24d•$82,700 - $144,700

About The Position

1.Build and maintain scalable data pipelines using extract, transform, and load (ETL) software including Pentaho Data Integration, Amazon Business Data Technologies Cradle, and Amazon Knowledge Graph Data Lake to perfom data cleaning and manipulation on large-scale datasets. Design and build solutions by leveraging off the shelf services like AWS Glue; programming languages including Javascript, SQL, SparkSQL, and Python; custom made tools including Graphiq Imports and Data Lake S3 Crawler; and LLMs (Large Language Models) like Cedric Personas and LLM Batch Inference. Analyze and optimize pipeline performance through systematic monitoring and troubleshooting via query optimization, logic refinement, and tooling collaboration with partner engineering teams. Create and maintain documentation of common ETL resolution procedures for knowledge sharing. 2. Design and implement ontology structures that effectively represent a knowledge domain both conceptually in the real world and based on structured data while maintaining flexibility for future expansion. Own ontology review documents, host and actively participate in ontology discussions, submit Change Requests (CRs), and merge CRs in the ontology codebase. Use generative AI tooling, like Rapid Ontology Creation for KEs (ROCK), to automate ontology and data mapping processes, while integrating expertise at critical decision points. Develop mappings using JSON structure and Jinja templates to establish concrete relationships between data layer and ontological constructs. Configure metadata on critical data values (e.g. foreign keys, external keys, data types) to ensure materialized mappings are valid and comprehensive. 3. Enable query grounding on Amazon Knowledge Graph systems (e.g. Graphiq collections, Knowledge Panels, Neptune Graphs) by creating semantic understanding and materialization patterns via search templates, narratives, Jinja template verbalizations, SparkSQL or Cypher queries. Work with partner engineering and science teams to debug tooling issues, provide training data for ranking and retrieval model improvements, and generate billions of quads from millions of data points local to sources like Wikidata, FireTV, IMDb, etc. Write Cypher queries to add to production index. 4. Record and review Amazon Knowledge Graph system projects in comprehensive documents to track progress, provide forum for reviewer feedback, and evaluate query grounding performance in beta and production environments. Perform comprehensive pre-launch quality assurance processes including integration testing and evaluations to ensure high-quality customer experiences. Conduct thorough testing in beta and production environments before deployment of new features or updates, including exhaustive customer query tests in gamma and production environments to judge the quality of the user experience. 5. Discover and investigate data gaps, quality issues, and failure patterns through comprehensive failure space analysis (FSA) of Alexa/Alexa+ customer data. Create detailed reports categorizing issues by impact and complexity, while developing measurement frameworks to track improvements. Design and propose actionable solutions such as data sourcing, improving/adding new data pipelines, and updating existing grounding technique configurations based on the result evaluation. 6. Participate in team oncall rotation by acting as the point person for any high-severity (sev 2.5+) tickets submitted to the team and triaging issues independently on rotational basis. 7. Triage customer- and internally-reported issues by performing root cause analysis on non-deterministic systems, re-assigning issues to other teams when applicable, and implementing short- and long-term resolutions to mitigate customer impact and improve overall experience. 8. Design and implement comprehensive monitoring systems that track key performance indicators across data pipelines and knowledge grounding systems. Monitor dashboards to provide real-time insights into knowledge graph response quality and performance. Create automated alerting mechanisms using ETL for anomaly detection and establish baseline metrics for continuous improvement. Track Alexa customer utterance defects by creation and evaluation of MESA (Metric Extraction, Storage, and Access) tables to generate aggregated KPI metrics (e.g. Customer Perceived Defect Rate, Claim Accuracy Rate). Write custom SQL logic to create metrics dashboards using Redash and MESA.

Requirements

A Bachelor’s degree or foreign equivalent in quantitative or analytical field followed by 2 year(s) of progressively responsible experience in the job offered or a related occupation.
Must have 1 year(s) of experience in the following:
Experience in building and update data models, in one or more sub-domains that efficiently and extensibly solve difficult business and/or technology problems.
Experience in designing solutions where the domain model, business problems, and/or data requirements may not be completely defined, with the ability to present a variety of solutions with the benefits and challenges of each approach.
Experience in analyzing and distilling large or unnormalized data sets to derive conclusions and inform data-model decisions.
Experience in working on project ideas with partner teams, technical stakeholders, and peers. Experience interpreting product requirements and diving into the architecture of any relevant systems to define technical project requirements, scope, and deadlines.
Experience in presenting solutions that are comprehensive, customer-focused, and demonstrate feasibility in production systems.
Experience in driving operational excellence, identifying model or process improvements and proposing solutions.
Experience in pushing code and/or holding design reviews that tend to be rapid and uneventful and provide useful reviews for changes submitted by others.
Experience in training new teammates on how the team’s domain data model is constructed, how it operates, and how it fits into the bigger picture.

Nice To Haves

All applicants must meet all the above listed requirements.

Responsibilities

Build and maintain scalable data pipelines using extract, transform, and load (ETL) software including Pentaho Data Integration, Amazon Business Data Technologies Cradle, and Amazon Knowledge Graph Data Lake to perfom data cleaning and manipulation on large-scale datasets.
Design and build solutions by leveraging off the shelf services like AWS Glue; programming languages including Javascript, SQL, SparkSQL, and Python; custom made tools including Graphiq Imports and Data Lake S3 Crawler; and LLMs (Large Language Models) like Cedric Personas and LLM Batch Inference.
Analyze and optimize pipeline performance through systematic monitoring and troubleshooting via query optimization, logic refinement, and tooling collaboration with partner engineering teams.
Create and maintain documentation of common ETL resolution procedures for knowledge sharing.
Design and implement ontology structures that effectively represent a knowledge domain both conceptually in the real world and based on structured data while maintaining flexibility for future expansion.
Own ontology review documents, host and actively participate in ontology discussions, submit Change Requests (CRs), and merge CRs in the ontology codebase.
Use generative AI tooling, like Rapid Ontology Creation for KEs (ROCK), to automate ontology and data mapping processes, while integrating expertise at critical decision points.
Develop mappings using JSON structure and Jinja templates to establish concrete relationships between data layer and ontological constructs.
Configure metadata on critical data values (e.g. foreign keys, external keys, data types) to ensure materialized mappings are valid and comprehensive.
Enable query grounding on Amazon Knowledge Graph systems (e.g. Graphiq collections, Knowledge Panels, Neptune Graphs) by creating semantic understanding and materialization patterns via search templates, narratives, Jinja template verbalizations, SparkSQL or Cypher queries.
Work with partner engineering and science teams to debug tooling issues, provide training data for ranking and retrieval model improvements, and generate billions of quads from millions of data points local to sources like Wikidata, FireTV, IMDb, etc.
Write Cypher queries to add to production index.
Record and review Amazon Knowledge Graph system projects in comprehensive documents to track progress, provide forum for reviewer feedback, and evaluate query grounding performance in beta and production environments.
Perform comprehensive pre-launch quality assurance processes including integration testing and evaluations to ensure high-quality customer experiences.
Conduct thorough testing in beta and production environments before deployment of new features or updates, including exhaustive customer query tests in gamma and production environments to judge the quality of the user experience.
Discover and investigate data gaps, quality issues, and failure patterns through comprehensive failure space analysis (FSA) of Alexa/Alexa+ customer data.
Create detailed reports categorizing issues by impact and complexity, while developing measurement frameworks to track improvements.
Design and propose actionable solutions such as data sourcing, improving/adding new data pipelines, and updating existing grounding technique configurations based on the result evaluation.
Participate in team oncall rotation by acting as the point person for any high-severity (sev 2.5+) tickets submitted to the team and triaging issues independently on rotational basis.
Triage customer- and internally-reported issues by performing root cause analysis on non-deterministic systems, re-assigning issues to other teams when applicable, and implementing short- and long-term resolutions to mitigate customer impact and improve overall experience.
Design and implement comprehensive monitoring systems that track key performance indicators across data pipelines and knowledge grounding systems.
Monitor dashboards to provide real-time insights into knowledge graph response quality and performance.
Create automated alerting mechanisms using ETL for anomaly detection and establish baseline metrics for continuous improvement.
Track Alexa customer utterance defects by creation and evaluation of MESA (Metric Extraction, Storage, and Access) tables to generate aggregated KPI metrics (e.g. Customer Perceived Defect Rate, Claim Accuracy Rate).
Write custom SQL logic to create metrics dashboards using Redash and MESA.

Benefits

A sign-on bonus and restricted stock units may be provided as part of the compensation package, in addition to a full range of medical, financial, and/or other benefits, dependent on the position offered.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume