Data Platform Engineer

Hitachi Digital ServicesDallas, TX
11d

About The Position

As a Data Platform Engineer, you will lead the design, development, and optimization of our large-scale, cloud-native data platform. You will architect and build robust ETL/ELT pipelines using PySpark and Databricks, leveraging Delta Lake, Unity Catalog, and Delta Live Tables. You will own DevOps automation through GitHub Actions and ensure fast, reliable deployments of Databricks assets. You will also manage the AWS infrastructure supporting the platform—focusing on secure, scalable, and high-performing environments. This role requires deep expertise in distributed data processing, Databricks engineering, CI/CD automation, and cloud infrastructure.

Requirements

  • 10+ years of experience building scalable Data Engineering platforms and production-grade pipelines
  • 3+ years of hands-on Databricks development, including expertise in: Delta Lake (ACID, time travel, optimization) Unity Catalog (security, governance, metadata) Delta Live Tables (DLT) Workspaces, Repos, Jobs, and Databricks SQL
  • 3+ years of AWS experience, including: VPC, Subnets, Endpoints, Routing IAM roles, policies, cross-account access S3-based data lake implementation
  • Expert programming skills in Python (4+ years)
  • Deep hands-on experience with PySpark and advanced SQL
  • Proven CI/CD experience using GitHub Actions or similar tools
  • Strong understanding of ETL/ELT, Data Lake, Data Warehouse, and distributed computing concepts
  • Agile (Scrum) experience and Git proficiency

Nice To Haves

  • Experience with AWS data services such as Glue, Athena, Redshift, RDS, DynamoDB
  • Knowledge of real-time streaming (Kafka, Spark Structured Streaming)
  • Experience building ML feature pipelines
  • Background in performance tuning and capacity planning for large Spark clusters

Responsibilities

  • Build and maintain high-scale ETL/ELT pipelines across diverse data sources
  • Implement and optimize Databricks workflows using PySpark, Python, DLT, and Unity Catalog
  • Configure and manage AWS environments including VPCs, IAM, S3, and secure connectivity
  • Establish CI/CD pipelines using GitHub Actions for automated deployment of Databricks notebooks, jobs, and pipelines
  • Drive data quality via automated testing frameworks (unit, integration, performance)
  • Optimize cluster performance and cost efficiency
  • Lead best practices in Medallion Architecture, ACID data principles, and high-performance SQL
  • Create clear technical documentation, architecture diagrams, and design specifications

Benefits

  • We help take care of your today and tomorrow with industry-leading benefits, support, and services that look after your holistic health and wellbeing.
  • We’re also champions of life balance and offer flexible arrangements that work for you (role and location dependent).
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service