Performance Optimization Analyze and optimize existing Hadoop/Spark pipelines to improve processing speed, resource utilization, and reliability Identify bottlenecks in data workflows and implement solutions that reduce processing time and costs Tune Spark jobs, Hive queries, and Impala performance through partitioning strategies, caching, and execution plan optimization Design and build scalable data pipelines using Spark (Scala) to process terabytes of data efficiently Develop robust ETL/ELT workflows that integrate data from multiple sources into Hadoop environment and Oracle data warehouses Implement data quality checks and monitoring to ensure pipeline reliability Work closely with product teams to understand requirements and deliver data solutions Participate in code reviews and contribute to engineering best practices
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
5,001-10,000 employees