Senior Lead Data Engineer, Content Engineering

Paramount•New York, NY

About The Position

We are hiring a Senior Lead Data Engineer to build and scale the data foundations that power Paramount’s next-generation personalization systems across Home, Search/Browse, Notifications, and Artwork. This role sits at the core of the Content Engineering vertical, partnering closely with Applied ML, ML Platform, and Causal Science teams to deliver highly reliable, ML-ready data at global scale. You will design and operate pipelines processing billions of daily events, petabyte-scale feature stores, and real-time engagement streams that support ranking and recommendations. This is a high-impact role for an engineer who thrives in distributed systems, large-scale ETL/streaming, and delivering production-grade infrastructure aligned with cutting-edge personalization. Paramount is investing heavily in a unified personalization operating model. In this role, you will directly shape: The Data Backbone: Building the core of our personalization ecosystem. The User Experience: Defining the feature sets that identify what millions of users view. Innovation Velocity: Enabling ML teams to innovate quickly and safely through high-quality experimentation data.

Requirements

7+ years of experience in large-scale data or software engineering.
Hands-on Expertise: Deep experience with Spark (PySpark/Scala), Databricks, Airflow, and Kafka.
ML Data Modeling: Proficiency in feature pipelines, temporal joins, and mitigating training-serving skew.
Cloud Ecosystems: Experience with AWS/Azure/GCP and high-performance engines like Snowflake or Redshift.
Technical Foundations: Proficient programming skills in Python and SQL with a focus on performance optimization.

Nice To Haves

Experience in personalization domains (search, ranking, or recommender systems).
Experience supporting petabyte-scale data lakehouses or feature stores.
Familiarity with GenAI/RAG systems, multimodal content, or Delta Live Tables.
Knowledge of Causal Inference, experimentation signals, or ML evaluation workflows.
Experience with Terraform for governed, repeatable deployments.

Responsibilities

Build & Operate Large-Scale Feature Pipelines: Design and maintain batch/streaming pipelines (Spark, Flink, Databricks, Airflow) producing ML features for ranking models.
Ensure Point-in-Time Correctness: Develop feature sets that enable unbiased offline training and credible online inference.
Develop Embedding & Content Pipelines: Build scalable workflows for metadata, imagery, and multimodal representations; partner with Science teams to operationalize new models.
Architect Data Foundations: Design Delta/Parquet data models and medallion layers, optimizing storage layout and partitioning for latency and cost.
Real-Time Engineering: Build Kafka-based systems for real-time features and user-activity aggregations, ensuring robust handling of out-of-order events and exactly-once semantics.
Governance & Leadership: Define data quality rules and schema evolution processes while collaborating across ML pods to translate model needs into infrastructure.