Databricks Platform Architect

Koantek•Chandler, AZ

51d•Remote

About The Position

We are seeking an accomplished, technology-driven Lead Data Platform Architect / Migration Specialist to spearhead the modernization of our core enterprise financial and tax allocation engines. In this role, you will lead the architectural design, definition of migration strategies, and hands-on implementation to transition large-scale legacy relational database systems (SQL Server/T-SQL) into a modern, cloud-native Databricks Lakehouse platform. The ideal candidate will have extensive experience in high-throughput distributed systems, Databricks compute optimization, performance tuning, and complex pipeline orchestration.

Requirements

Deep expert-level knowledge of Databricks (Lakehouse architecture, Delta Lake, Unity Catalog) and Apache Spark / PySpark.
Strong background in relational databases, with advanced proficiency in SQL Server, T-SQL, and Stored Procedures. Ability to reverse-engineer and refactor legacy database logic into distributed paradigms.
Hands-on experience with Apache Airflow or similar modern workflow orchestrators.
Proven track record in cost optimization (FinOps), cluster tuning, autoscaling configurations, and handling skewed data profiles.
Experience with Infrastructure as Code (Terraform), data build tool (dbt), testing frameworks (PyTest), and automated Git-based workflows.
10+ years of experience in Data Engineering/Architecture, with at least 3+ years specifically leading large-scale cloud data migrations.
Bachelor’s or Master's degree in Computer Science, Engineering, or a related technical field.

Nice To Haves

Databricks Certified Data Engineer Associate / Professional
Databricks Certified Solutions Architect
AWS Certified Database Specialist or equivalent Cloud Certifications

Responsibilities

Validate, refine, and own the target architecture on Databricks. Define robust migration strategies and production-ready reference patterns to convert 150+ complex stored procedures into PySpark and Structured/Declarative Pipelines (SDP).
Design distributed processing frameworks, control flows, and configuration-driven parameter handling for both full and incremental recalculation modes.
Address performance deltas between small and large workloads. Architect and implement acceleration techniques such as caching, partition pruning, cluster sizing, and offline/pre-calculation strategies to maintain sub-30-second user-facing reporting SLAs.
Design and deploy enterprise-level pipeline orchestration using tools like Apache Airflow or Databricks Workflows. Integrate robust logging, error handling, and observability patterns into existing enterprise monitoring frameworks.
Implement data governance models, data lineage, and schema evolution utilizing tools like Unity Catalog.
Establish best practices for AI-assisted code generation (e.g., using Claude or advanced LLMs), providing code-review patterns and refactoring frameworks to ensure maintainable and performant output.
Lead code walkthroughs, design reviews, and pair-programming sessions with the development team to accelerate knowledge transfer and technical excellence.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume