Data Platform Engineer

Hitachi Digital Services•Dallas, TX

39d

About The Position

As a Data Platform Engineer, you will lead the design, development, and optimization of our large-scale, cloud-native data platform. You will architect and build robust ETL/ELT pipelines using PySpark and Databricks, leveraging Delta Lake, Unity Catalog, and Delta Live Tables. You will own DevOps automation through GitHub Actions and ensure fast, reliable deployments of Databricks assets. You will also manage the AWS infrastructure supporting the platform—focusing on secure, scalable, and high-performing environments. This role requires deep expertise in distributed data processing, Databricks engineering, CI/CD automation, and cloud infrastructure.

Requirements

10+ years of experience building scalable Data Engineering platforms and production-grade pipelines
3+ years of hands-on Databricks development, including expertise in: Delta Lake (ACID, time travel, optimization) Unity Catalog (security, governance, metadata) Delta Live Tables (DLT) Workspaces, Repos, Jobs, and Databricks SQL
3+ years of AWS experience, including: VPC, Subnets, Endpoints, Routing IAM roles, policies, cross-account access S3-based data lake implementation
Expert programming skills in Python (4+ years)
Deep hands-on experience with PySpark and advanced SQL
Proven CI/CD experience using GitHub Actions or similar tools
Strong understanding of ETL/ELT, Data Lake, Data Warehouse, and distributed computing concepts
Agile (Scrum) experience and Git proficiency

Nice To Haves

Experience with AWS data services such as Glue, Athena, Redshift, RDS, DynamoDB
Knowledge of real-time streaming (Kafka, Spark Structured Streaming)
Experience building ML feature pipelines
Background in performance tuning and capacity planning for large Spark clusters

Responsibilities

Build and maintain high-scale ETL/ELT pipelines across diverse data sources
Implement and optimize Databricks workflows using PySpark, Python, DLT, and Unity Catalog
Configure and manage AWS environments including VPCs, IAM, S3, and secure connectivity
Establish CI/CD pipelines using GitHub Actions for automated deployment of Databricks notebooks, jobs, and pipelines
Drive data quality via automated testing frameworks (unit, integration, performance)
Optimize cluster performance and cost efficiency
Lead best practices in Medallion Architecture, ACID data principles, and high-performance SQL
Create clear technical documentation, architecture diagrams, and design specifications

Benefits

We help take care of your today and tomorrow with industry-leading benefits, support, and services that look after your holistic health and wellbeing.
We’re also champions of life balance and offer flexible arrangements that work for you (role and location dependent).

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume