Staff Software Engineer (Python)

Duetto Research

9h•Remote

About The Position

This role is for a Staff Software Engineer on Duetto's Data Platform team, focusing on owning the data infrastructure that powers real-time pricing decisions for thousands of hotels worldwide. Duetto is the hospitality industry's leading revenue management platform, founded in 2012, and built the world's first Revenue & Profit Operating System. The company is recognized for its AI-first engineering culture, where AI tools like Claude Code and a custom multi-agent system are integrated into daily workflows. Backed by GrowthCurve Capital since 2024, Duetto is accelerating its investment in AI and is genuinely passionate about the hospitality industry, building products for customers they care about.

Requirements

7+ years building production data systems in Python
Deep expertise in PySpark and distributed data processing — Glue, EMR, or Databricks
Strong experience with lakehouse architectures: Iceberg, Delta Lake, or Hudi on S3
Production experience with Airflow or a comparable workflow orchestrator
Solid AWS production experience across S3, Glue, Athena, Lambda, and SQS
A track record of improving data quality, governance, and pipeline reliability at scale

Nice To Haves

Working knowledge of Java for reading upstream systems
Experience with Trino or Presto for interactive SQL analytics at scale
Experience with dbt for data transformation and modelling
Familiarity with Great Expectations or similar data quality frameworks
Genuine interest in AI-assisted development and LLM-based tooling
Familiarity with hospitality data — reservations, rates, inventory, demand signals

Responsibilities

Own the design, performance, and reliability of Duetto's data lakehouse — evolving the Python/PySpark pipeline framework across a bronze → silver → gold architecture on AWS, including Glue jobs, Iceberg MERGE operations, schema evolution, and partitioning strategies.
Architect the shift from batch to near-real-time streaming, building SQS-driven stream pipelines with Iceberg sinks and expanding ingestion, normalisation, and analytics layers across the full lakehouse.
Drive data quality and governance at scale — extending the Great Expectations framework, leading adoption of data contracts to formalise schemas between producers and consumers, and owning the Athena SQL layer that analysts and product teams depend on.
Strengthen observability and reliability through Datadog, Sentry, and Sumo Logic, while optimising Glue job performance — worker sizing, DPU allocation, Spark tuning, and cost management.
Build and maintain shared internal Python libraries published to JFrog, and drive improvements to GitHub Actions, Docker-based testing, and CI/CD deployment workflows.
Work AI-first every day — using Claude Code and MCP tools in your regular workflow, and contributing to AI-assisted pipeline generation, schema inference, and automated data quality alongside a custom multi-agent system with 17 specialised agents.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume