Data Engineering - Data, Lakehouse and AI Data Platform Engineer - Analyst/Associate - Dallas

Goldman Sachs•Dallas, TX

48d

About The Position

Join a team building the data foundations that support the firm’s AI and analytics capabilities. This role sits within the engineering effort to develop a modern Lakehouse and AI data platform that enables reliable, well-governed and high-performing data use across the firm. At Goldman Sachs, engineering teams are positioned at the center of the business, building scalable systems, solving complex technical problems and turning data into action. In data engineering roles, the emphasis is on designing, building and maintaining large-scale data platforms, delivering production pipelines, improving reliability and quality, and partnering closely with users of the platform. This is a delivery-focused role for engineers who want to build robust data assets in production, work with modern data technologies, and grow over time within the firm. You will contribute to the data models, pipelines and platform capabilities that underpin analytics, operational decision-making and emerging AI use cases. Role Summary As a Data Engineer, Lakehouse and AI Data Platform, you will design, build, test and support data pipelines and curated datasets on the firm’s modern data platform. You will work across ingestion, transformation, modelling, optimization and data quality, helping to deliver data products that are reliable, scalable and fit for purpose. The role is suited to engineers who are comfortable writing code, working with SQL and distributed data processing, and solving practical delivery problems in a team environment. More experienced candidates may also contribute to technical design, platform standards and the shaping of delivery approaches across a wider set of use cases.

Requirements

0-2+ years of experience
Bachelor’s or master’s degree in a relevant discipline, or equivalent practical experience, with evidence of strong quantitative skills or data engineering expertise.
Strong hands-on programming experience in Python or Java.
Good working knowledge of SQL, including troubleshooting, optimisation and data analysis.
Ability to learn new tools, internal platforms and delivery workflows quickly.
Familiarity with software engineering fundamentals, including version control, testing, release discipline and CI/CD practices.
Understanding of temporal data modelling, including the handling of historical state and change over time.
Knowledge of schema design, schema evolution and data compatibility considerations.
Understanding of partitioning, clustering and other techniques used to improve data performance at scale.
Ability to make sensible design choices across normalised and denormalised models, and between natural and surrogate keys.
Practical approach to data quality, reconciliation and root-cause analysis.
Experience building or supporting production data pipelines in a collaborative engineering environment.
Experience working with distributed data processing frameworks such as Apache Spark.
Working knowledge of common data formats such as JSON, Avro and Parquet

Responsibilities

Build, enhance and support batch and streaming data pipelines on the Lakehouse and AI data platform.
Refactor or modernise existing data flows where needed to improve reliability, performance and maintainability.
Ensure data pipelines are production-ready, well tested and operationally supportable.
Develop raw, refined and curated datasets that support analytics, reporting and AI use cases.
Apply sound data modelling principles to represent business entities, relationships and historical change accurately.
Work with consumers to shape data products that are usable, well documented and aligned to business needs.
Implement controls to validate completeness, accuracy and consistency of data across pipelines and datasets.
Use reconciliation approaches to build confidence in production outputs and investigate breaks where they arise.
Contribute to clear standards for testing, monitoring and issue resolution.
Work closely with engineers, platform teams and data consumers to deliver agreed outcomes to time and quality expectations.
Communicate clearly on progress, risks, dependencies and design choices.
For more senior candidates, take a broader role in technical leadership, task breakdown and support for junior engineers.