Data Engineer, IMDb

Amazon.com, Inc.•Seattle, WA

36d

About The Position

IMDb is the world's most popular and authoritative source for information on movies, TV shows, and celebrities. With more than 250 million monthly unique visitors, IMDb connects entertainment fans worldwide with comprehensive information on over 12 million titles and 13 million cast and crew members. We're seeking a Data Engineer who is both a fan of entertainment and understands the business of film and television. You'll help modernize our data infrastructure by upgrading legacy systems with scalable, cloud-native solutions - while leveraging AI to automate and enhance data pipelines. This role requires technical depth, industry knowledge, and a willingness to experiment with emerging technologies. You'll work alongside Data Engineers who are building the data foundation for IMDb's reporting and Insight products; Data Scientists that are building prediction and customer segmentation models; and Business Intelligence Engineers that deliver insights to IMDb stakeholders. Our team values technical excellence, experimentation, and using data to improve the entertainment discovery experience for millions of users.

Requirements

4+ years of data engineering experience
Experience with data modeling, warehousing and building ETL pipelines
Experience with SQL
Knowledge of professional software engineering & best practices for full software development life cycle, including coding standards, software architectures, code reviews, source control management, continuous deployments, testing, and operational excellence
Knowledge of distributed systems as it pertains to data storage and computing
Experience programming with at least one modern language such as C++, C#, Java, Python, Golang, PowerShell, Ruby
Knowledge of data governance, privacy, compliance, and security best practices

Nice To Haves

Experience with AWS technologies like Redshift, S3, AWS Glue, EMR, Kinesis, FireHose, Lambda, and IAM roles and permissions
Experience with non-relational databases / data stores (object storage, document or key-value stores, graph databases, column-family databases)
Experience in the entertainment industry
Experience applying generative AI and LLMs to data engineering challenges

Responsibilities

Design, build, and maintain batch and streaming data pipelines processing terabytes of entertainment metadata, user traffic and engagement metrics, and entertainment industry data
Build ETL CDK pipelines using AWS services including S3, Glue, EMR, Redshift, Kinesis, and MWAA (Airflow)
Implement AI-assisted data engineering workflows including LLM-powered data quality checks, automated schema evolution, and intelligent data cataloging
Collaborate with Data Scientists to productionize GenAI and ML models that support content moderation, user personalization, customer segmentation, and more
Partner with Business Intelligence Engineers to build data models supporting analytics, reporting, and business decision-making
Optimize query performance and data storage costs across petabyte-scale datasets
Establish data quality frameworks and monitoring systems to ensure accuracy of entertainment industry data
Mentor junior engineers on data engineering best practices and AWS technologies