Google Cloud Data Architect & IAM Data Modernization

Retail Industry•Dallas, TX

2d•Hybrid

About The Position

This role focuses on the Identity & Access Management (IAM) Data Modernization project, which involves migrating an on-premises SQL data warehouse to a target-state Data Lake on Google Cloud (GCP). The goal is to enable metrics & reporting, advanced analytics, and GenAI use cases. This will be achieved by leveraging PySpark-based processing, cloud-native DevOps CI/CD pipelines, and containerized deployments on OpenShift (OCP) to deliver scalable, secure, and high-performance data solutions.

Requirements

DevOps / CI‑CD Experience implementing CI/CD pipelines for data and analytics workloads
Familiarity with Git‑based source control, build automation , and deployment strategies
Experience with OpenShift Container Platform (OCP) for deploying data workloads and services
Understanding of containerized architecture, scaling, and environment management
Proven ability to build CI/CD pipelines for data and infrastructure workloads
Experience managing secrets securely using GCP Secret Manager
Ownership of observability, SLOs, dashboards, alerts, and runbooks
Proficiency in logging, monitoring, and alerting for data pipelines and platform reliability
Hands-on experience with PySpark for ETL/ELT, data transformation, and performance optimization
Solid understanding of distributed data processing concepts
Strong experience designing data platforms on Google Cloud Platform (GCP)
Experience with Data Lakes, data warehousing, and large‑scale migration programs
Proven experience designing and implementing data lake architectures (e.g., Bronze/Silver/Gold or layered models).
Strong knowledge of Cloud Storage (GCS) design, including bucket layout, naming conventions, lifecycle policies, and access controls
Experience with Hadoop/HDFS architecture, distributed file systems, and data locality principles
Hands-on experience with columnar data formats (Parquet, Avro, ORC) and compression techniques
Expertise in partitioning strategies , backfills, and large-scale data organization
Ability to design data models optimized for analytics and BI consumption
Experience building batch and streaming ingestion pipelines using GCP-native services
Knowledge of Pub/Sub-based streaming architectures , event schema design, and versioning
Strong understanding of incremental ingestion and CDC patterns , including idempotency and deduplication
Hands-on experience with workflow orchestration tools (Cloud Composer / Airflow)
Ability to design robust error handling, replay, and backfill mechanisms
Experience developing scalable batch and streaming pipelines using Dataflow (Apache Beam) and/or Spark (Dataproc)
Strong proficiency in BigQuery SQL , including query optimization, partitioning, clustering, and cost control.
Hands-on experience with Hadoop MapReduce and ecosystem tools (Hive, Pig, Sqoop)
Advanced Python programming skills for data engineering, including testing and maintainable code design
Experience managing schema evolution while minimizing downstream impact
Expertise in BigQuery performance optimization and data serving patterns
Experience building semantic layers and governed metrics for consistent analytics
Familiarity with BI integration , access controls, and dashboard standards
Understanding of data exposure patterns via views, APIs, or curated datasets
Experience implementing data catalogs, metadata management, and ownership models
Understanding of data lineage for auditability and troubleshooting
Strong focus on data quality frameworks , including validation, freshness checks, and alerting
Experience defining and enforcing data contracts, schemas, and SLAs
Hands-on experience implementing fine-grained access controls for BigQuery and GCS
Experience with Sprint planning and helping team technically.
Strong stakeholder communication and solution‑architecture skills
10–14+ years in DevOps and Data Architecture
5+ years designing on Pyspark/GCP/OCP at scale
Prior on‑prem → cloud migration experience is a must.
Bachelor’s/Master’s in Computer Science, Information Systems, or equivalent experience.

Nice To Haves

Google Cloud Professional Architect/DevOps/OCP certification (required or within 3 months).
Professional Data Engineer certification.
Security Engineer certification.

Responsibilities

Implement CI/CD pipelines for data and analytics workloads.
Deploy data workloads and services using OpenShift Container Platform (OCP).
Manage secrets securely using GCP Secret Manager.
Own observability, SLOs, dashboards, alerts, and runbooks.
Implement logging, monitoring, and alerting for data pipelines and platform reliability.
Design and implement data lake architectures (e.g., Bronze/Silver/Gold or layered models).
Design and implement batch and streaming ingestion pipelines using GCP-native services.
Design robust error handling, replay, and backfill mechanisms for data pipelines.
Develop scalable batch and streaming pipelines using Dataflow (Apache Beam) and/or Spark (Dataproc).
Optimize BigQuery performance and implement data serving patterns.
Build semantic layers and governed metrics for consistent analytics.
Implement data catalogs, metadata management, and ownership models.
Define and enforce data contracts, schemas, and SLAs.
Implement fine-grained access controls for BigQuery and GCS.
Participate in Sprint planning and provide technical guidance to the team.