Overview What You'll Achieve: Design, develop, and maintain scalable data processing solutions across on-premises and cloud environments using Python and Apache Spark. Optimize and fine-tune Spark jobs for performance, including resource utilization, shuffling, partitioning, and caching, to ensure maximum efficiency in large-scale big data environments. Design and implement scalable, fault-tolerant data pipelines with end-to-end monitoring, alerting, and logging. Leverage AWS cloud services (2+ years preferred) to build and manage data pipelines and distributed processing workloads. Develop and optimize SQL queries across relational and data warehouse systems (RDBMS/Data Warehouse). Apply design patterns and best practices for efficient data modeling, partitioning, and distributed system performance. Use Git (or equivalent) for source control and maintain strong unit and integration testing practices. Collaborate with Product Owners, partners, and multi-functional teams to translate business requirements into technical solutions. Demonstrate strong analytical skills with the ability to extract actionable insights from large datasets and support data-driven decision-making. Mentor junior engineers, conduct code reviews, and contribute to establishing engineering best practices and standards. Who You Are: 5+ years of software development experience in a scalable, distributed, or multi-node environment. Proficient in programming with Scala, Python, or Java; comfortable building data-driven solutions at scale. Significant experience with Apache Spark and exposure to Hadoop, Hive, and related big data technologies. Demonstrated experience with cloud platforms (AWS, preferred) and an interest in cloud migration projects. Eager to deepen your expertise with the Databricks platform. Exposure to modern data tools and frameworks such as Kubernetes, Docker, and Airflow. Strong problem-solving skills with the ability to own problems end-to-end and deliver results. Consultative mentality - comfortable taking initiative, building relationships, communicating broadly, and tackling challenges head-on. Collaborative teammate with an eagerness to learn from peers and mentors while contributing to a culture of growth. Motivated to grow your career within a dynamic, innovative company. What you'll bring: BA/BS in Computer Science or related field. At least 5 years of experience as a big data software developer. Experience in Machine Learning, including model development, feature engineering, or integrating ML workflows into data pipelines. Experience with Databricks, including Notebooks, Delta Lake, Jobs, Pipelines, and Unity Catalog, preferred. Proficiency with the ELK stack (Elasticsearch, Logstash, Kibana) for real-time search, log analysis, and visualization, preferred. AWS, Databricks, or Spark certification a plus.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Industry
Professional, Scientific, and Technical Services
Number of Employees
5,001-10,000 employees