Google Cloud Data Architect & IAM Data Modernization

Retail IndustryDallas, TX
Hybrid

About The Position

This role focuses on the Identity & Access Management (IAM) Data Modernization project, which involves migrating an on-premises SQL data warehouse to a target-state Data Lake on Google Cloud (GCP). The goal is to enable metrics & reporting, advanced analytics, and GenAI use cases. This will be achieved by leveraging PySpark-based processing, cloud-native DevOps CI/CD pipelines, and containerized deployments on OpenShift (OCP) to deliver scalable, secure, and high-performance data solutions.

Requirements

  • DevOps / CI‑CD Experience implementing CI/CD pipelines for data and analytics workloads
  • Familiarity with Git‑based source control, build automation , and deployment strategies
  • Experience with OpenShift Container Platform (OCP) for deploying data workloads and services
  • Understanding of containerized architecture, scaling, and environment management
  • Proven ability to build CI/CD pipelines for data and infrastructure workloads
  • Experience managing secrets securely using GCP Secret Manager
  • Ownership of observability, SLOs, dashboards, alerts, and runbooks
  • Proficiency in logging, monitoring, and alerting for data pipelines and platform reliability
  • Hands-on experience with PySpark for ETL/ELT, data transformation, and performance optimization
  • Solid understanding of distributed data processing concepts
  • Strong experience designing data platforms on Google Cloud Platform (GCP)
  • Experience with Data Lakes, data warehousing, and large‑scale migration programs
  • Proven experience designing and implementing data lake architectures (e.g., Bronze/Silver/Gold or layered models).
  • Strong knowledge of Cloud Storage (GCS) design, including bucket layout, naming conventions, lifecycle policies, and access controls
  • Experience with Hadoop/HDFS architecture, distributed file systems, and data locality principles
  • Hands-on experience with columnar data formats (Parquet, Avro, ORC) and compression techniques
  • Expertise in partitioning strategies , backfills, and large-scale data organization
  • Ability to design data models optimized for analytics and BI consumption
  • Experience building batch and streaming ingestion pipelines using GCP-native services
  • Knowledge of Pub/Sub-based streaming architectures , event schema design, and versioning
  • Strong understanding of incremental ingestion and CDC patterns , including idempotency and deduplication
  • Hands-on experience with workflow orchestration tools (Cloud Composer / Airflow)
  • Ability to design robust error handling, replay, and backfill mechanisms
  • Experience developing scalable batch and streaming pipelines using Dataflow (Apache Beam) and/or Spark (Dataproc)
  • Strong proficiency in BigQuery SQL , including query optimization, partitioning, clustering, and cost control.
  • Hands-on experience with Hadoop MapReduce and ecosystem tools (Hive, Pig, Sqoop)
  • Advanced Python programming skills for data engineering, including testing and maintainable code design
  • Experience managing schema evolution while minimizing downstream impact
  • Expertise in BigQuery performance optimization and data serving patterns
  • Experience building semantic layers and governed metrics for consistent analytics
  • Familiarity with BI integration , access controls, and dashboard standards
  • Understanding of data exposure patterns via views, APIs, or curated datasets
  • Experience implementing data catalogs, metadata management, and ownership models
  • Understanding of data lineage for auditability and troubleshooting
  • Strong focus on data quality frameworks , including validation, freshness checks, and alerting
  • Experience defining and enforcing data contracts, schemas, and SLAs
  • Hands-on experience implementing fine-grained access controls for BigQuery and GCS
  • Experience with Sprint planning and helping team technically.
  • Strong stakeholder communication and solution‑architecture skills
  • 10–14+ years in DevOps and Data Architecture
  • 5+ years designing on Pyspark/GCP/OCP at scale
  • Prior on‑prem → cloud migration experience is a must.
  • Bachelor’s/Master’s in Computer Science, Information Systems, or equivalent experience.

Nice To Haves

  • Google Cloud Professional Architect/DevOps/OCP certification (required or within 3 months).
  • Professional Data Engineer certification.
  • Security Engineer certification.

Responsibilities

  • Implement CI/CD pipelines for data and analytics workloads.
  • Deploy data workloads and services using OpenShift Container Platform (OCP).
  • Manage secrets securely using GCP Secret Manager.
  • Own observability, SLOs, dashboards, alerts, and runbooks.
  • Implement logging, monitoring, and alerting for data pipelines and platform reliability.
  • Design and implement data lake architectures (e.g., Bronze/Silver/Gold or layered models).
  • Design and implement batch and streaming ingestion pipelines using GCP-native services.
  • Design robust error handling, replay, and backfill mechanisms for data pipelines.
  • Develop scalable batch and streaming pipelines using Dataflow (Apache Beam) and/or Spark (Dataproc).
  • Optimize BigQuery performance and implement data serving patterns.
  • Build semantic layers and governed metrics for consistent analytics.
  • Implement data catalogs, metadata management, and ownership models.
  • Define and enforce data contracts, schemas, and SLAs.
  • Implement fine-grained access controls for BigQuery and GCS.
  • Participate in Sprint planning and provide technical guidance to the team.

Benefits

  • Flexible work from home options available.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service