Principal Platform Engineer — Data Private Cloud (Kubernetes/OpenShift)

Wells FargoIselin, NJ
1d$159,000 - $305,000Hybrid

About The Position

Principal Platform Engineer — Data Private Cloud (Kubernetes/OpenShift) Wells Fargo is back in the office three days a week, collaborating for fabulous outcomes! This role has no visa sponsorship or visa transfers. We’re seeking a Principal Platform Engineer to lead the technical strategy, architecture, and delivery of Wells Fargo’s enterprise Data Private Cloud. This is a Kubernetes/OpenShift platform engineering role, responsible for designing and operating the large‑scale infrastructure that powers data, analytics, and AI workloads across the company. This is not a data engineering role. You won’t be building pipelines or analytics solutions. Instead, you will: Architect the Kubernetes-based data platform Build infrastructure, automation, and security foundations Define standards, controls, and multi‑tenant patterns Enable data engineers, ML engineers, and analytics teams by providing a scalable, secure platform they run on This is a hands-on senior engineering role with end-to-end ownership of platform architecture. Core Responsibilities Technical Leadership & Architecture Own the architecture of the enterprise data platform (OpenShift, Kubernetes, modern data stacks) Define platform standards for security, scalability, multi-tenancy, and operational excellence Lead decisions around compute orchestration (Spark on K8s, YuniKorn), query federation (Trino, Kyuubi), and metadata management (Gravitino, Hive Metastore) Design authentication/authorization (Keycloak, AD, Ranger) Shape infrastructure strategy and open-source deployment patterns Infrastructure & Platform Engineering Lead Terraform-based IaC and repeatable deployment practices Architect networking, ingress, and service mesh configurations Oversee PKI, SSL/TLS, and certificate lifecycle management Build monitoring and observability strategies (OpenSearch, Prometheus, Grafana) Ensure resilience through scheduling, quotas, and capacity planning Implement GitOps for declarative deployments Data Platform Components (Platform Enablement, Not Data Engineering) You provide leadership for the platform that runs these technologies, not the pipelines or applications built on them: Compute: Spark on K8s, Kyuubi, JupyterHub Query/Analytics: Trino, Superset Orchestration: Airflow on Kubernetes Catalog/Governance: Gravitino, DataHub, Ranger Storage: Iceberg, S3/NetApp, PostgreSQL Messaging/Search: Kafka, OpenSearch Security & Compliance Ensure compliance with regulatory requirements (OSFI, SOX, PCI-DSS) Implement multi-tenant isolation and robust security boundaries Lead security reviews, threat modeling, and remediation Partner with Security, Risk, and Compliance teams on audits and controls

Requirements

  • 7+ years of Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 5+ years of hands-on experience with Kubernetes in production environments (OpenShift Container Platform strongly preferred)

Nice To Haves

  • Experience in financial services or other highly regulated industries
  • Experience with Kubernetes scheduling frameworks (YuniKorn, Volcano) for batch and AI workload optimization
  • Contributions to open-source projects in the data or infrastructure
  • Experience building and deploying applications with enterprise data sources
  • Hands-on experience with transformer architectures and fine-tuning open-source models
  • Professional certifications: CKA/CKAD, AWS/Azure/GCP Professional, Terraform Associate
  • Experience with GitOps practices (ArgoCD, Flux)
  • Background in platform product management or developer experience
  • Expert-level Kubernetes knowledge: deployments, stateful workloads, operators, CRDs, RBAC, network policies, storage classes
  • OpenShift Container Platform: Routes, SCCs, cluster administration, operator lifecycle management
  • Infrastructure-as-code with Terraform (modules, state management, provider development)
  • Container runtimes, image registries, and CI/CD pipeline integration
  • Apache Spark: architecture, tuning, Spark on Kubernetes, dynamic resource allocation
  • Distributed SQL engines (Trino, Presto) including federation and connector development
  • Apache Airflow: DAG design, executor configurations, Kubernetes executor
  • Data catalog and lineage tools (DataHub, Apache Atlas, or similar)
  • Apache Ranger or equivalent fine-grained authorization frameworks
  • Apache Iceberg or similar table formats; Hive Metastore operations
  • AIOps tools: anomaly detection with Prophet, PyOD, or custom models; log analytics with OpenSearch ML
  • Prometheus with recording rules, Grafana ML features, custom alerting models
  • Enterprise identity integration: LDAP, Active Directory, SAML, OIDC
  • Keycloak administration, realm configuration, and custom provider development
  • PKI, certificate management, and TLS termination strategies
  • Secrets management (HashiCorp Vault, Kubernetes secrets, external secrets operators)

Responsibilities

  • Own the architecture of the enterprise data platform (OpenShift, Kubernetes, modern data stacks)
  • Define platform standards for security, scalability, multi-tenancy, and operational excellence
  • Lead decisions around compute orchestration (Spark on K8s, YuniKorn), query federation (Trino, Kyuubi), and metadata management (Gravitino, Hive Metastore)
  • Design authentication/authorization (Keycloak, AD, Ranger)
  • Shape infrastructure strategy and open-source deployment patterns
  • Lead Terraform-based IaC and repeatable deployment practices
  • Architect networking, ingress, and service mesh configurations
  • Oversee PKI, SSL/TLS, and certificate lifecycle management
  • Build monitoring and observability strategies (OpenSearch, Prometheus, Grafana)
  • Ensure resilience through scheduling, quotas, and capacity planning
  • Implement GitOps for declarative deployments
  • Ensure compliance with regulatory requirements (OSFI, SOX, PCI-DSS)
  • Implement multi-tenant isolation and robust security boundaries
  • Lead security reviews, threat modeling, and remediation
  • Partner with Security, Risk, and Compliance teams on audits and controls

Benefits

  • Health benefits
  • 401(k) Plan
  • Paid time off
  • Disability benefits
  • Life insurance, critical illness insurance, and accident insurance
  • Parental leave
  • Critical caregiving leave
  • Discounts and savings
  • Commuter benefits
  • Tuition reimbursement
  • Scholarships for dependent children
  • Adoption reimbursement
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service