Senior Data Engineer

Hitachi Digital ServicesReading, PA
4h

About The Position

We’re Hitachi Digital Services, a global digital solutions and transformation business with a bold vision of our world’s potential. We’re people-centric and here to power good. Every day, we future-proof urban spaces, conserve natural resources, protect rainforests, and save lives. This is a world where innovation, technology, and deep expertise come together to take our company and customers from what’s now to what’s next. We make it happen through the power of acceleration. Imagine the sheer breadth of talent it takes to bring a better tomorrow closer to today. We don’t expect you to ‘fit’ every requirement – your life experience, character, perspective, and passion for achieving great things in the world are equally as important to us. You will be part of a high-impact Data & Analytics engineering team focused on building scalable, cloud-native data platforms that power enterprise analytics and mission-critical business applications. The team collaborates closely with data architects, platform leads, and cross-functional stakeholders to design modern data lake architectures and real-time data ecosystems on AWS. As a Senior Data Engineer, you will play a hands-on role in designing, building, and operating high-performance batch and streaming data platforms.

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related discipline.
  • Strong hands-on expertise in PySpark, Spark SQL, and distributed data processing.
  • Advanced proficiency in Python for building scalable, production-grade data solutions and microservices.
  • Proven experience building and running Kafka-based streaming applications in production environments.
  • Deep understanding of streaming fundamentals, including stateful processing and fault tolerance.
  • Hands-on experience with Apache Iceberg in production data lake environments.
  • Solid experience with AWS data services (S3, EMR, Glue, Lambda, Redshift, Aurora).
  • Advanced SQL skills and strong knowledge of data modeling and modern data lake architectures.
  • Strong troubleshooting skills in distributed data systems with a focus on reliability and performance.

Responsibilities

  • Design, develop, and maintain large-scale batch and streaming pipelines using PySpark and Python.
  • Build real-time and near real-time streaming applications with stateful processing, windowing, and checkpointing.
  • Develop production-grade Python microservices for complex data transformations and business logic.
  • Design and manage modern data lake architectures using Apache Iceberg on AWS S3, implementing schema evolution, partitioning, compaction, and time travel.
  • Develop and deploy pipelines across AWS services including S3, EMR, Glue, Lambda, Athena, Redshift, and Aurora.
  • Optimize Spark workloads for performance, scalability, and cost efficiency.
  • Implement monitoring, logging, alerting, and recovery mechanisms for robust production operations.
  • Contribute to CI/CD pipelines, participate in architecture discussions, and uphold engineering best practices.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service