Data Engineer Data Pipelines and ETL

Paramount•Burbank, CA

9h•$99,000 - $147,000

About The Position

The Data Engineering team is hiring a Data Engineer – Data Pipeline & ETL. You will help build and maintain scalable data platforms and ETL/ELT pipelines in a fast-moving environment. In this role, you will build and support batch and real-time data systems powering analytics, ML, and AI applications. You will also grow your expertise in modern data architecture and cloud-native best practices.

Requirements

2–4+ years of experience building and scaling ETL/ELT pipelines in production environments.
Experience with workflow orchestration tools such as Airflow, Composer, or similar platforms.
Strong understanding of distributed data processing concepts.
Expert-level SQL skills for large-scale transformation and analytics.
Experience designing scalable warehouse schemas and ML-ready data layers.
Strong experience optimizing complex queries across multi-terabyte datasets.
Proficiency in Python (or similar language) for data processing and ML pipeline integration.
Experience with distributed processing frameworks such as Spark.
Familiarity integrating data pipelines with ML platforms such as Vertex AI (preferred), Databricks ML, or equivalent.
Experience building real-time data pipelines using Kafka, Pub/Sub, or similar technologies.
Understanding of feature streaming, low-latency data processing, and event-driven architectures.
Ability to architect and build real-time dashboards using Superset.
Experience designing cloud-native data architectures (GCP preferred).
Experience with lakehouse architectures and cloud data warehouses.
Familiarity with vector databases, embeddings pipelines, and AI-serving infrastructure is a plus.
Bachelor's or Master's degree in Computer Science, Engineering, or related field (or equivalent experience).
2–4+ years of experience in data engineering, data pipeline development, or related fields.
Strong foundation in modern data engineering principles, distributed systems design, and cloud-native architectures.
Demonstrated ability to design and operate large-scale production data systems.
Proven track record of technical leadership and cross-functional collaboration.
Strong problem-solving skills and ability to thrive in complex, fast-paced environments.
Detail-oriented and committed to engineering excellence and continuous improvement.

Nice To Haves

Familiarity integrating data pipelines with ML platforms such as Vertex AI (preferred), Databricks ML, or equivalent.
Familiarity with vector databases, embeddings pipelines, and AI-serving infrastructure is a plus.
GCP preferred.

Responsibilities

Design, develop, and maintain scalable batch and streaming data pipelines for large-scale structured and unstructured datasets.
Build robust ETL/ELT frameworks supporting analytics, BI, experimentation, and machine learning use cases.
Optimize pipelines for performance, reliability, scalability, and cost efficiency.
Implement advanced ingestion patterns including CDC, incremental loads, and event-driven processing.
Design scalable, dimensional, and hybrid data models optimized for analytics and ML use cases.
Develop reusable transformation layers (semantic layers) that serve BI, ML, and AI applications.
Write optimized, production-grade SQL for large-scale analytics workloads.
Contribute to query optimization, indexing, partitioning, and performance tuning across distributed systems and cloud warehouses.
Build and maintain modular data components following established framework patterns.
Contribute to architectural decisions across streaming systems, data lakes, and warehouses.
Implement automated data validation, anomaly detection, and monitoring frameworks.
Establish data lineage and metadata standards to support reproducibility in ML workflows.
Enforce governance, privacy, and security best practices, particularly for sensitive AI datasets.
Ensure responsible AI data usage and compliance standards.