Staff Data Engineer

Pismo
Remote

About The Position

We’re hiring a Staff Data Engineer to design and build robust data pipelines for our corporate Data Lake. In this role, you’ll own data products end-to-end in production, work autonomously on significant projects, and mentor junior engineers. You’ll be expected to make sound data engineering trade-offs under limited supervision, with strong hands-on experience in Spark, Databricks, Delta Lake, and Airflow. At Pismo, the Data Lake team is responsible for centralizing and organizing data into a single, trusted platform that supports decision-making across the company and for external clients. We work on challenges such as scaling global data infrastructure, delivering high-quality reporting, and enabling secure, self-service access to data—helping teams move faster while avoiding information silos.

Requirements

  • Apache Spark (PySpark, SparkSQL) — production experience
  • Databricks (jobs, workflows, cluster management, tuning)
  • Delta Lake (ACID tables, OPTIMIZE, VACUUM, schema evolution, MERGE)
  • Advanced SQL (window functions, CTEs, query optimization)
  • Apache Airflow / MWAA (DAG design, retries, backfills, SLAs)
  • Amazon S3 data lake design (partitioning, layout, lifecycle)
  • Data quality frameworks (Great Expectations or equivalent)
  • Data modeling (dimensional / Kimball, medallion layers)
  • Git/GitHub, CI/CD for data pipelines
  • Terraform
  • Python for automation and data processing
  • English level B2
  • Be based in Brazil

Nice To Haves

  • CDC patterns (DMS, incremental processing, MERGE upserts)
  • Streaming ingestion (Structured Streaming, Auto Loader)
  • AWS Glue Catalog / Unity Catalog
  • Metadata management (OpenMetadata)
  • BI integration (Superset, dashboarding)

Responsibilities

  • Design and implement data ingestion and transformation pipelines (batch and near-real-time) using PySpark/SparkSQL on Databricks.
  • Own data pipelines end-to-end in production: freshness, correctness, availability, and SLA adherence.
  • Build and maintain Delta Lake tables following medallion architecture patterns (bronze/silver/gold).
  • Design and optimize Airflow DAGs (MWAA) for complex orchestration scenarios.
  • Implement and maintain data quality frameworks (Great Expectations or equivalent) as integrated pipeline gates.
  • Write advanced SQL for data modeling, transformation, and performance optimization.
  • Conduct thorough code reviews.
  • mentor Analyst-level engineers through pairing and design guidance.
  • Investigate, diagnose, and resolve data quality incidents and pipeline failures independently.
  • Collaborate with Analytics, BI, and Product teams to design consumer-friendly datasets.
  • Contribute to CI/CD, testing standards, and data governance practices.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service