1.Build and maintain scalable data pipelines using extract, transform, and load (ETL) software including Pentaho Data Integration, Amazon Business Data Technologies Cradle, and Amazon Knowledge Graph Data Lake to perfom data cleaning and manipulation on large-scale datasets. Design and build solutions by leveraging off the shelf services like AWS Glue; programming languages including Javascript, SQL, SparkSQL, and Python; custom made tools including Graphiq Imports and Data Lake S3 Crawler; and LLMs (Large Language Models) like Cedric Personas and LLM Batch Inference. Analyze and optimize pipeline performance through systematic monitoring and troubleshooting via query optimization, logic refinement, and tooling collaboration with partner engineering teams. Create and maintain documentation of common ETL resolution procedures for knowledge sharing. 2. Design and implement ontology structures that effectively represent a knowledge domain both conceptually in the real world and based on structured data while maintaining flexibility for future expansion. Own ontology review documents, host and actively participate in ontology discussions, submit Change Requests (CRs), and merge CRs in the ontology codebase. Use generative AI tooling, like Rapid Ontology Creation for KEs (ROCK), to automate ontology and data mapping processes, while integrating expertise at critical decision points. Develop mappings using JSON structure and Jinja templates to establish concrete relationships between data layer and ontological constructs. Configure metadata on critical data values (e.g. foreign keys, external keys, data types) to ensure materialized mappings are valid and comprehensive. 3. Enable query grounding on Amazon Knowledge Graph systems (e.g. Graphiq collections, Knowledge Panels, Neptune Graphs) by creating semantic understanding and materialization patterns via search templates, narratives, Jinja template verbalizations, SparkSQL or Cypher queries. Work with partner engineering and science teams to debug tooling issues, provide training data for ranking and retrieval model improvements, and generate billions of quads from millions of data points local to sources like Wikidata, FireTV, IMDb, etc. Write Cypher queries to add to production index. 4. Record and review Amazon Knowledge Graph system projects in comprehensive documents to track progress, provide forum for reviewer feedback, and evaluate query grounding performance in beta and production environments. Perform comprehensive pre-launch quality assurance processes including integration testing and evaluations to ensure high-quality customer experiences. Conduct thorough testing in beta and production environments before deployment of new features or updates, including exhaustive customer query tests in gamma and production environments to judge the quality of the user experience. 5. Discover and investigate data gaps, quality issues, and failure patterns through comprehensive failure space analysis (FSA) of Alexa/Alexa+ customer data. Create detailed reports categorizing issues by impact and complexity, while developing measurement frameworks to track improvements. Design and propose actionable solutions such as data sourcing, improving/adding new data pipelines, and updating existing grounding technique configurations based on the result evaluation. 6. Participate in team oncall rotation by acting as the point person for any high-severity (sev 2.5+) tickets submitted to the team and triaging issues independently on rotational basis. 7. Triage customer- and internally-reported issues by performing root cause analysis on non-deterministic systems, re-assigning issues to other teams when applicable, and implementing short- and long-term resolutions to mitigate customer impact and improve overall experience. 8. Design and implement comprehensive monitoring systems that track key performance indicators across data pipelines and knowledge grounding systems. Monitor dashboards to provide real-time insights into knowledge graph response quality and performance. Create automated alerting mechanisms using ETL for anomaly detection and establish baseline metrics for continuous improvement. Track Alexa customer utterance defects by creation and evaluation of MESA (Metric Extraction, Storage, and Access) tables to generate aggregated KPI metrics (e.g. Customer Perceived Defect Rate, Claim Accuracy Rate). Write custom SQL logic to create metrics dashboards using Redash and MESA.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level