Staff Data Engineer

Pismo

2d•Remote

About The Position

We’re hiring a Staff Data Engineer to design and build robust data pipelines for our corporate Data Lake. In this role, you’ll own data products end-to-end in production, work autonomously on significant projects, and mentor junior engineers. You’ll be expected to make sound data engineering trade-offs under limited supervision, with strong hands-on experience in Spark, Databricks, Delta Lake, and Airflow. At Pismo, the Data Lake team is responsible for centralizing and organizing data into a single, trusted platform that supports decision-making across the company and for external clients. We work on challenges such as scaling global data infrastructure, delivering high-quality reporting, and enabling secure, self-service access to data—helping teams move faster while avoiding information silos.

Requirements

Apache Spark (PySpark, SparkSQL) — production experience
Databricks (jobs, workflows, cluster management, tuning)
Delta Lake (ACID tables, OPTIMIZE, VACUUM, schema evolution, MERGE)
Advanced SQL (window functions, CTEs, query optimization)
Apache Airflow / MWAA (DAG design, retries, backfills, SLAs)
Amazon S3 data lake design (partitioning, layout, lifecycle)
Data quality frameworks (Great Expectations or equivalent)
Data modeling (dimensional / Kimball, medallion layers)
Git/GitHub, CI/CD for data pipelines
Terraform
Python for automation and data processing
English level B2
Be based in Brazil

Nice To Haves

CDC patterns (DMS, incremental processing, MERGE upserts)
Streaming ingestion (Structured Streaming, Auto Loader)
AWS Glue Catalog / Unity Catalog
Metadata management (OpenMetadata)
BI integration (Superset, dashboarding)

Responsibilities

Design and implement data ingestion and transformation pipelines (batch and near-real-time) using PySpark/SparkSQL on Databricks.
Own data pipelines end-to-end in production: freshness, correctness, availability, and SLA adherence.
Build and maintain Delta Lake tables following medallion architecture patterns (bronze/silver/gold).
Design and optimize Airflow DAGs (MWAA) for complex orchestration scenarios.
Implement and maintain data quality frameworks (Great Expectations or equivalent) as integrated pipeline gates.
Write advanced SQL for data modeling, transformation, and performance optimization.
Conduct thorough code reviews.
mentor Analyst-level engineers through pairing and design guidance.
Investigate, diagnose, and resolve data quality incidents and pipeline failures independently.
Collaborate with Analytics, BI, and Product teams to design consumer-friendly datasets.
Contribute to CI/CD, testing standards, and data governance practices.