Data Bricks Data Architect

QodeCalifornia, CA
3dOnsite

About The Position

We are seeking a Data Bricks Data Architect to support the design, implementation, and optimization of cloud-native data platforms built on the Data bricks Lakehouse Architecture. This is a hands-on, engineering-driven role requiring deep experience with Apache Spark, Delta Lake, and scalable data pipeline development, combined with early-stage architectural responsibilities. The role involves close onsite collaboration with client stakeholders, translating analytical and operational requirements into robust, high-performance data architectures, while adhering to best practices for data modeling, governance, reliability, and cost efficiency.

Nice To Haves

  • Exposure to Data bricks Unity Catalog, data governance, and access control models
  • Experience with Data bricks Workflows, Apache Airflow, or Azure Data Factory for orchestration
  • Familiarity with streaming frameworks (Spark Structured Streaming, Kafka) and/or CDC patterns
  • Understanding of data quality frameworks, validation checks, and observability concepts
  • Experience integrating Data bricks with BI tools such as Power BI, Tableau, or Looker
  • Awareness of cost optimization strategies in cloud-based data platforms
  • Prior Lifesciences Domain Experience

Responsibilities

  • Design, develop, and maintain batch and near-real-time data pipelines using Databricks, PySpark, and Spark SQL
  • Implement Medallion (Bronze/Silver/Gold) Lakehouse architectures, ensuring proper data quality, lineage, and transformation logic across layers
  • Build and manage Delta Lake tables, including schema evolution, ACID transactions, time travel, and optimized data layouts
  • Apply performance optimization techniques such as partitioning strategies, Z-Ordering, caching, broadcast joins, and Spark execution tuning
  • Support dimensional and analytical data modeling for downstream consumption by BI tools and analytics applications
  • Assist in defining data ingestion patterns (batch, incremental loads, CDC, and streaming where applicable)
  • Troubleshoot and resolve pipeline failures, data quality issues, and Spark job performance bottlenecks.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service