BIBA Practice - Cloud Data Lead

HEXAWAREUnited States,
Onsite

About The Position

We are seeking a Senior Spark Engineer with strong Java expertise to design, develop, and operate high-performance, production-scale data processing pipelines. The role focuses on Apache Spark-based batch and streaming solutions, robust ETL, performance tuning, and close collaboration with data engineering, data science, and platform teams.

Requirements

  • 5+ years of software engineering experience with at least 3+ years building production systems using Apache Spark.
  • Strong Java development skills (Java 8+); solid understanding of concurrent programming, memory management, and JVM tuning.
  • Production experience with Spark Core, Spark SQL, and Structured Streaming.
  • Hands-on experience with the Hadoop ecosystem components (HDFS, YARN, Hive) or cloud object storage (S3/GCS/Azure Blob).
  • Experience integrating with Kafka or other message brokers for real-time ingestion.
  • Experience with Scala or Python (PySpark) for cross-language integrations.
  • Java (primary): language proficiency, performance profiling, GC tuning.
  • Apache Spark: job design, RDD/DataFrame/Dataset APIs, Catalyst optimizer understanding.
  • Structured Streaming: exactly-once semantics, watermarking, state management.
  • Data storage: Hive, Parquet/ORC, Avro, schema evolution best practices.
  • Messaging & ingestion: Apache Kafka (producers/consumers), Connectors.
  • Orchestration & CI/CD: Airflow, Jenkins/GitHub Actions/GitLab CI or equivalent.
  • Containerization/cluster deployment: Yarn, Kubernetes experience for Spark on K8s.
  • Monitoring & observability: Prometheus/Grafana, ELK/EFK stack or Cloud-native equivalents.
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or equivalent practical experience.

Responsibilities

  • Own design and development of scalable data pipelines using Apache Spark for batch and streaming workloads.
  • Implement Spark applications in Java (primary) and integrate with the broader data platform (HDFS/S3, Hive, Kafka, relational and NoSQL stores).
  • Optimize Spark jobs for performance, memory usage, and resource efficiency; troubleshoot production issues and reduce job failures/latency.
  • Develop reusable libraries, frameworks, and abstractions to accelerate data engineering work.
  • Implement data ingestion, transformation, and enrichment patterns, ensuring data quality, schema evolution handling, and idempotence.
  • Integrate Spark workloads with orchestration and scheduling systems (Airflow/Elasticsearch/Nifi or equivalent).
  • Build and maintain CI/CD pipelines, automated tests (unit/integration), and deployment practices for data applications.
  • Collaborate with data scientists to productionize models and feature engineering pipelines.
  • Drive observability and monitoring for Spark jobs (metrics, logging, alerting).
  • Mentor and review work of mid/junior engineers; participate in architecture and design reviews.
  • Ensure security, governance, and compliance requirements are met for data processing.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service