About The Position

Amazon Web Services Open Data Analytics (ODA) organization is looking for exceptional engineers to help in our mission to provide the world’s best cloud Big Data processing platform and services such as EMR and Athena. The ODA engines team is looking for an experienced engineer to join the core engines and datalake team. Athena and EMR are services that our customer use to run large scale analytics, leveraging open source engines like Apache Spark and Trino, with datalake open table formats like Apache Iceberg, Hudi and Delta. The analytics engines organization makes significant modifications to these engines to run in serverless environments and with superior performance and scalability than what is available in Open Source. In the last 3 years we have improved our engines by a factor of 5x by making changes to the optimizer, query runtime and storage connectors. We have also made significant changes to the compiler to enable enterprise features like fine grain access control with these engines and table formats. Additionally, we strive to regularly contribute features, bug fixes and optimizations back to open-source, as well be current with the latest open-source versions of these frameworks. This is a “must-win” strategic area in a growing and very technical space. We are seeking a passionate and hands-on engineer to collaborate closely with open-source communities like Apache Iceberg and Apache Spark, driving innovations in query engines and table format integrations. In this role, you will focus on performance optimizations, feature enhancements, stability improvements, and security hardening, making deep contributions across the query engine and table format codebases. As a key member of the Engines team, you will shape the technical direction, influence design decisions, implement critical features, and foster collaboration with both internal teams and the open-source community. AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services.

Requirements

  • 3+ years of non-internship professional software development experience
  • 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
  • 3+ years of programming using a modern programming language such as Java, C++, or C#, including object-oriented design experience

Nice To Haves

  • 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
  • Bachelor's degree in computer science or equivalent
  • Experience in developing and operating distributed systems or applications at large scale
  • Experience working on open table formats (Iceberg, Hudi, Delta) or query engines (Spark, Trino, Flink etc) is a huge plus
  • Experience contributing to open source code bases, and collaborating with open source communities

Responsibilities

  • Develop and optimize core components of query engines and open table formats (Iceberg, Hudi, Delta) to enhance performance, scalability, and reliability.
  • Design and implement innovative solutions and algorithms to improve feature capabilities, stability, and security in table format integrations with query engines.
  • Collaborate with the open-source community, contributing to discussions, driving improvements, and integrating upstream changes.
  • Ensure data consistency and durability while achieving breakthrough performance and scalability for large-scale data lake workloads.
  • Improve the organizations automation and testing capabilities.
  • Manage complex deliverables project and research projects with deadlines.
  • Mentor and train other team members on design techniques and coding best practices.
  • Be a point of contact for challenging customer issues related to data lake workloads and query engine.

Benefits

  • health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
  • 401(k) matching
  • paid time off
  • parental leave
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service