Hadoop Big Data Developer

Bright Vision Technologies•Lawrenceville, GA

1d•$100,000 - $150,000•Remote

About The Position

Bright Vision Technologies is a forward-thinking software development company dedicated to building innovative solutions that help businesses automate and optimize their operations. We leverage cutting-edge technologies to create scalable, secure, and user-friendly applications. As we continue to grow, we’re looking for a skilled Hadoop Big Data Developer to join our dynamic team and contribute to our mission of transforming business processes through technology. This is a fantastic opportunity to join an established and well-respected organization offering tremendous career growth potential. We are seeking an experienced Hadoop Big Data Developer to design, build, and operate large-scale data processing pipelines and analytics platforms on Hadoop and related big-data ecosystems. In this role you will be responsible for ingesting, transforming, and analyzing massive volumes of structured and unstructured data to support enterprise analytics, machine learning, and reporting workloads. The ideal candidate will combine deep technical expertise across the Hadoop ecosystem with strong software engineering fundamentals and a clear understanding of how to deliver reliable, performant, and cost-effective data platforms in production environments.

Requirements

Bachelor’s degree in Computer Science, Engineering, or a related technical discipline.
Five or more years of professional experience designing and operating big-data pipelines on Hadoop.
Strong hands-on expertise with Apache Spark (Scala, Python, or Java) in production environments.
Solid experience with Hive, HDFS, Sqoop, HBase, and the broader Hadoop ecosystem.
Hands-on experience with streaming data platforms such as Kafka, Spark Streaming, or Flink.
Strong SQL skills and experience working with both relational and NoSQL data stores.
Experience with workflow orchestration tools such as Airflow or Oozie.
Solid understanding of distributed systems concepts, including partitioning, replication, and fault tolerance.
Strong scripting skills in Python or Shell.
Excellent troubleshooting, debugging, and documentation skills.

Nice To Haves

Experience operating Hadoop on cloud platforms such as AWS EMR, Azure HDInsight, or Databricks.
Familiarity with modern lakehouse formats (Delta, Iceberg, Hudi).
Exposure to data governance tooling such as Apache Atlas or Collibra.
Experience with Kubernetes-based data platforms (Spark-on-K8s, Trino).
Hands-on experience with CI/CD and infrastructure-as-code in data engineering workflows.

Responsibilities

Design, develop, and operate end-to-end big-data pipelines on Hadoop, ingesting data from a diverse mix of relational, file-based, streaming, and API-driven sources.
Build robust ETL/ELT workflows using Apache Spark, Hive, Pig, and Sqoop, with strong attention to data quality, idempotency, error handling, and recoverability.
Develop high-throughput streaming data pipelines using Kafka, Spark Streaming, or Flink, and integrate them with downstream analytical and operational systems.
Optimize Spark and MapReduce jobs through careful tuning of partitioning, memory, serialization, and skew handling to meet demanding SLAs at minimal cost.
Design and maintain data models and storage layouts on HDFS, Hive, HBase, and modern lakehouse formats (Parquet, ORC, Delta, Iceberg, Hudi) to balance flexibility and performance.
Implement data governance, lineage, and quality controls in collaboration with data governance and security teams.
Build robust monitoring, alerting, and logging strategies for big-data pipelines, including job-level SLAs and proactive failure detection.
Partner with data scientists and analysts to deliver curated, reliable, and well-documented datasets that accelerate their work.
Automate pipeline orchestration using Airflow, Oozie, or similar workflow engines, with clean dependency management and clear ownership boundaries.
Continuously evaluate and adopt new technologies in the big-data and cloud ecosystem (EMR, Databricks, Snowflake, BigQuery) where they offer meaningful improvements.
Lead performance reviews and architecture audits of existing pipelines, proposing concrete refactoring and optimization initiatives.
Document data architectures, schemas, pipeline behaviors, and operational runbooks in a way that makes the platform supportable as the team scales.
Mentor junior engineers and contribute to the team’s engineering standards and best practices.