Data Engineer

DuckbillSan Francisco, CA
4dOnsite

About The Position

We are developing a SaaS product that simplifies financial planning and analysis of cloud billing data for large enterprises with complex cloud spending requirements. We're looking for a data engineer to wrangle complex cloud billing data by designing the pipelines that power our product. We have fascinating technical challenges around data modeling and continuous quality control. We're analyzing massive amounts of semistructured data at scale: processing cloud bills with constantly evolving schemas—complexity that only increases as we expand functionality and provider support. On the frontend, customers use the data to drive large financial decisions, so full data product ownership and quality is key.

Requirements

  • 3+ years experience with data products: warehouses/lakehouses/OLAPs, ETL pipelines, or job queues
  • Software engineering experience, with intermediate Python experience
  • Strong SQL skills including CTEs, window functions, and query optimization
  • Experience with data validation and quality control systems
  • Comfortable with columnar databases, Parquet, and cloud storage (S3)
  • Ability to deliver results in hours instead of days
  • Some experience in a startup environment, or ability to work well in a startup environment
  • Fastidiousness about data quality and comfort when there's no answer key

Nice To Haves

  • Experience with ClickHouse or other OLAP datastores
  • Past experience with cost management tools and/or cloud billing data
  • Experience with Airflow or similar workflow orchestration tools
  • Backend engineering experience beyond data pipelines

Responsibilities

  • Build and maintain ETL pipelines processing hundreds of millions of rows of cloud billing data
  • Work with ClickHouse, Parquet files, and S3 to design efficient data storage and retrieval systems
  • Develop data validation and quality control systems using Python and SQL
  • Design data models for complex, evolving cloud billing schemas (AWS CUR and beyond)
  • Build and optimize Airflow workflows for reliable data processing
  • Collaborate with the entire engineering team to investigate and resolve data quality issues
  • Scale data infrastructure as we expand to new cloud providers and use cases
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service