About The Position

Do you want to join an innovative team of scientists and engineers who use machine learning and artificial intelligence to help Amazon provide the best customer experience by preventing eCommerce fraud? Are you excited by the prospect of building scalable data infrastructure and pipelines that process terabytes of data, enabling state-of-the-art algorithms to solve real world problems? Do you like to own end-to-end data systems and directly impact the team's ability to deliver insights and models that drive company profitability? Do you enjoy collaborating in a diverse team environment? If yes, then you may be a great fit to join the Amazon Selling Partner Trust & Store Integrity Science Team. We are looking for a talented data engineer who is passionate about building robust data platforms and pipelines that empower scientists to develop advanced machine learning systems, helping manage the safety of millions of transactions every day and scaling up our operations with automation.

Requirements

  • 3+ years of data engineering experience
  • 1+ years of developing and operating large-scale data structures for business intelligence analytics using ETL/ELT processes experience
  • 1+ years of developing and operating large-scale data structures for business intelligence analytics using OLAP technologies experience
  • 1+ years of developing and operating large-scale data structures for business intelligence analytics using data modeling experience
  • 1+ years of developing and operating large-scale data structures for business intelligence analytics using SQL experience
  • 1+ years of developing and operating large-scale data structures for business intelligence analytics using Oracle experience
  • Experience with data modeling, warehousing and building ETL pipelines

Nice To Haves

  • Experience with AWS technologies like Redshift, S3, AWS Glue, EMR, Kinesis, FireHose, Lambda, and IAM roles and permissions
  • Experience with non-relational databases / data stores (object storage, document or key-value stores, graph databases, column-family databases)

Responsibilities

  • DATA INFRASTRUCTURE & PIPELINE DEVELOPMENT - Design, build, and maintain scalable data pipelines that support multiple ML model training and inference workflows
  • - Develop and optimize ETL processes to ingest, transform, and prepare terabytes of data from diverse sources for model consumption
  • - Implement robust data quality checks and monitoring systems to ensure data integrity across all pipelines
  • ML OPERATIONS SUPPORT - Build and maintain infrastructure for model training pipelines, including feature engineering, data versioning, and experiment tracking
  • - Design and implement scalable inference pipelines that serve predictions for millions of transactions with low latency and high reliability
  • - Collaborate with scientists to productionize ML models, translating research code into production-ready systems
  • SYSTEM PERFORMANCE & RELIABILITY - Optimize data processing workflows for cost efficiency and performance, managing compute and storage resources effectively
  • - Implement monitoring, alerting, and logging systems to ensure pipeline reliability and quick issue resolution
  • - Maintain comprehensive documentation of data schemas, pipeline architectures, and operational procedures
  • CROSS-FUNCTIONAL COLLABORATION - Partner closely with scientists to understand data requirements and translate them into technical solutions
  • - Work with stakeholders to define data SLAs and ensure systems meet business needs
  • - Provide technical guidance on data architecture decisions and best practices

Benefits

  • health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
  • 401(k) matching
  • paid time off
  • parental leave
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service