Staff Data Engineer (Scala, Spark, & Gen AI)

Disqo•Los Angeles, CA

3d•Hybrid

About The Position

DISQO is seeking a visionary technical leader with expertise in distributed data processing (Scala/Spark) and a passion for the intersection of data engineering and Artificial Intelligence. This role will serve as a force multiplier, working closely with engineering leadership, product managers, and analysts in a collaborative environment focused on rapid innovation and systemic impact. The company believes in autonomous teams that take ownership and move quickly, utilizing agile development practices, modern tooling, and strong engineering discipline. DISQO emphasizes architectural excellence, data correctness, system reliability, and building intelligent systems responsibly. As a Staff Data Engineer, you will set the technical direction for DISQO’s ad measurement platform, architecting, building, and scaling complex data pipelines while integrating Generative AI capabilities into core data infrastructure and products. You will address scalability challenges using Spark and Scala for massive datasets and leverage LLMs to unlock new value. Operating with autonomy, you will lead cross-functional technical initiatives, drive architectural decisions, and pioneer AI integration for data enrichment, pipeline automation, and quality improvement. You will also mentor senior and mid-level engineers, enhancing the team's technical depth in big data, cloud infrastructure, and applied AI.

Requirements

8+ years of experience building, architecting, and supporting complex production data pipelines, distributed systems, and backend infrastructure.
Expert-Level Scala & Spark: Deep, hands-on expertise in Scala and Apache Spark. You must understand Spark internals, query plans, memory management, and advanced performance tuning for massive-scale batch processing.
Applied Generative AI Experience: Proven experience integrating Gen AI / LLMs (e.g., OpenAI APIs, Anthropic, Bedrock) into data products or data engineering workflows. Hands on experience developing with AI dev tools such as Claude code, etc
Strong Python Skills: Proficiency in Python specifically to interface with modern AI ecosystems, data APIs, and orchestration tools.
Cloud Mastery: Extensive architectural experience within the AWS ecosystem (EMR, Glue, Athena, S3, Bedrock, etc.).
Core Data Foundations: Deep understanding of advanced ETL/ELT concepts, complex data modeling, and performance-tuning SQL.
Orchestration: Expert-level experience with workflow orchestration tools such as Airflow.
Leadership: Proven track record of leading technical initiatives, making architectural decisions, and mentoring teams in an agile, fast-moving environment.

Nice To Haves

Experience with Snowflake or other modern cloud data warehouses.
Deep exposure to streaming or real-time event processing (Kafka, Flink, Kinesis, etc.).
Experience utilizing AI for automated data observability, anomaly detection, or data quality tooling.
Background in ad tech, measurement, attribution modeling, or specialized analytics platforms.

Responsibilities

Architect and Lead: Design, build, and maintain highly scalable, fault-tolerant data pipelines using expert-level Scala and Apache Spark.
Gen AI Integration: Pioneer the use of Generative AI within our data ecosystem—incorporating LLMs to enrich datasets, extract value from unstructured data, automate metadata generation, and build intelligent data products.
Cross-Functional Strategy: Partner with Product and Engineering leadership to translate complex business requirements into forward-looking data and AI-augmented architectures.
Optimize Systems: Architect and aggressively optimize large-scale ETL/ELT workflows. Dive deep into Spark internals to resolve complex performance bottlenecks, memory issues, and data skew.
Modern AI Tooling: Implement and manage infrastructure to support AI integration, including vector databases, embeddings pipelines, and Retrieval-Augmented Generation (RAG) architectures.
Set the Standard: Write clean, highly optimized, and maintainable code, while establishing standards for code quality, testing, and system architecture across the organization.
Ensure Operational Excellence: Champion data quality, observability, and system health to consistently meet enterprise SLAs and customer commitments.
Mentorship: Actively mentor engineers, lead technical design reviews, and foster a culture of continuous learning and technical rigor.