PPCO Data Analytics – Systems & Data Analytics

General Motors•Warren, MI

1d•Hybrid

About The Position

The Program and Purchase Cost Optimization (PPCO) team at General Motors is seeking a highly motivated and technically skilled Data Analytics engineer to lead analytics initiatives that enable product and program level cost optimization through robust data engineering, exploratory data analysis (EDA), and predictive modeling. This role sits within the PPCO Systems and Data Analytics teams and focuses on building and maintaining the data foundations that power cost analytics initiatives across GM Finance. You will work with data from across the GM ecosystem, identifying interconnections between multiple enterprise applications from different functional areas (e.g., engineering, purchasing, finance, program management) and designing scalable data structures and pipelines that turn disparate and unstructured data into trusted, analysis-ready assets. The ideal candidate has deep experience in data engineering, ETL/ELT, database and table management, and EDA, combined with strong skills in classical predictive modeling. You will use current data to understand behaviors and to predict product attributes such as manufacturing parameters, product cost, program development cost creep, and economic factors based on parameters and features extracted from existing data. You will partner closely with Data Analysts, IT and cross-functional stakeholders to design and implement data architectures, curate high-quality datasets, and develop predictive models that deliver meaningful, actionable insights to improve vehicle profitability and program performance.

Requirements

Ability to translate ambiguous business questions into well-defined analytical and data engineering problems, and communicate findings and recommendations in a clear, structured manner to technical and non-technical stakeholders.
Bachelor’s degree in computer science, engineering, statistics, mathematics, physics, or a related quantitative field (advanced degree preferred).
5+ years as a data scientist / research scientist / ML engineer (or ~2 years with MS).
Advanced Python proficiency (e.g., pandas, NumPy, scikit-learn, PySpark).
Advanced SQL proficiency.
Experience with Databricks, Spark, and/or other cloud-based data platforms for large-scale data processing.
Experience designing and implementing ETL/ELT pipelines that integrate data from multiple transactional and analytical systems.
Strong skills in EDA to understand data quality, structure, and relationship
Practical experience developing predictive models and machine learning (regression methods, Clustering and segmentation, random forests, gradient boosting)
Solid problem-solving skills and the ability to convert business questions into data and modeling problems with clear hypotheses and success criteria.

Nice To Haves

Experience working with automotive, manufacturing, or engineering data.
Exposure to advanced ML/AI techniques (e.g., deep learning, NLP, LLMs) is a plus but not required; the primary focus of this role is data engineering, EDA, and classical predictive modeling in support of PPCO’s cost optimization mission.

Responsibilities

Design and maintain data models and curated datasets (relational schemas, dimensional models, and feature characterizations) that support PPCO analytics and efficient downstream consumption for reporting in tools such as Power BI (while not being primarily responsible for dashboard design).
Query, integrate, and engineer data from multiple enterprise systems and relational databases, and design, build, and operate robust ETL/ELT pipelines (e.g., Databricks/Spark) to produce unified, analysis-ready datasets with appropriate data quality checks, validation, and monitoring.
Perform in-depth EDA on large, heterogeneous datasets, engineer derived features (characterizations, aggregations, encodings), and understand data distributions, identify anomalies, and uncover data quality risks and insight to address by the business team.
Develop and validate predictive machine learning models and descriptive models (regression, clustering, decision trees, random forests, gradient boosting, time-series/panel models), utilizing existing data to predict product attributes (geometries, cost, economic indicators) from known input parameters and historical patterns.
Enable system integration across the GM ecosystem by designing the data pipelines, workflow automation and data transformation to take data from one system and turning it into a consumable format for the destination system.
Automate and optimize existing cost engineering workflows by building scalable, Python-based data and analytics tools.
Work across functional boundaries, including Engineering, Program Management, R&D, Finance, and Purchasing, to understand data sources, business logic, and use case

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume