Data Engineer

AppleCupertino, CA
Onsite

About The Position

Design, build, and maintain data pipelines that extract data from various sources such as databases (PostgreSQL, Cassandra, Iceberg, and hadoop ), APIs, data lakes, cloud storage or log files to Collect and consolidate data from multiple sources into a central data warehouse for reporting, analytics, and business intelligence purposes. Understand data sources, configure data extraction processes, manage data ingestion using Pyspark or Python, and automate the pipelines using Airflow to power data sources for analytics platforms like Tableau. Collaborate with machine learning engineers, data scientists, analysts, software engineers and managers to understand their data requirements and deliver them with reliable, distributed data pipelines to feed into data analytics and data visualization Platforms thereby allowing Apple’s stakeholders to easily leverage data in self-served manner. Perform data transformation tasks, including data cleaning, normalization, aggregation, and enrichment to prepare data for analytics and reporting pipelines. Utilize tools like SQL, scripting languages (Python), or ETL (Extract, Transform, Load) tools to manipulate and prepare data for predictive, statistical and trend analysis. Develop new and creative methodologies, such as self-optimizing data pipelines, and unified data pipeline that integrates and harmonizes data streams from various sources in real time, to evaluate test coverage and test pass rate to constantly improve Siri, by notifying and delivering feedback to engineering partners. Optimize existing data pipelines and database queries to improve performance and minimize latency of tableau dashboards. Identify and resolve bottlenecks, optimize data transformation processes, and implement indexing strategies to optimize data retrieval performance in databases.

Requirements

  • Master's degree or foreign equivalent in Computer Science, Engineering, Mathematics, Statistics, Business Analytics or a related field
  • 2 years of experience in the job offered or related occupation
  • 1 year of experience in Utilizing Tableau, including performing data preparation, data modeling, and data visualization
  • Experience monitoring metrics and assessing early signals or trends to identify or predict issues
  • Utilizing Java to build data infrastructure
  • Experience in Java libraries to improve data processing pipelines and experience processing large volume of data
  • Utilizing Microsoft Azure (or Google Cloud formerly known as GCP) to build, manage and analyze data at scale in a cloud environment
  • Utilizing Jupyter to prototype and explore data; and experience performing data manipulation, analysis and transformation
  • Experience extracting data from Iceberg, Postgres database and static excel (csv) files
  • Experience optimizing Extract, Transform, Load (ETL) pipelines
  • Utilizing Hadoop and Cassandra to store large volume of structure and unstructured data
  • Utilizing MySQL and postgreSQL to store relational data and query; and experience manipulating tables, performance tuning and completing database design
  • Utilizing Python and Spark to automate data-related workflows and processes
  • Experience transforming data using Numpy and Pandas
  • Experience performing statistical analysis and visualization

Responsibilities

  • Design, build, and maintain data pipelines that extract data from various sources such as databases (PostgreSQL, Cassandra, Iceberg, and hadoop ), APIs, data lakes, cloud storage or log files
  • Collect and consolidate data from multiple sources into a central data warehouse for reporting, analytics, and business intelligence purposes
  • Understand data sources, configure data extraction processes, manage data ingestion using Pyspark or Python, and automate the pipelines using Airflow
  • Collaborate with machine learning engineers, data scientists, analysts, software engineers and managers to understand their data requirements and deliver them with reliable, distributed data pipelines
  • Perform data transformation tasks, including data cleaning, normalization, aggregation, and enrichment to prepare data for analytics and reporting pipelines
  • Utilize tools like SQL, scripting languages (Python), or ETL (Extract, Transform, Load) tools to manipulate and prepare data for predictive, statistical and trend analysis
  • Develop new and creative methodologies, such as self-optimizing data pipelines, and unified data pipeline that integrates and harmonizes data streams from various sources in real time
  • Optimize existing data pipelines and database queries to improve performance and minimize latency of tableau dashboards
  • Identify and resolve bottlenecks, optimize data transformation processes, and implement indexing strategies to optimize data retrieval performance in databases

Benefits

  • Comprehensive medical and dental coverage
  • Retirement benefits
  • A range of discounted products and free services
  • Reimbursement for certain educational expenses — including tuition
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service