Corporate Vice President - Google Cloud Platform (GCP) Lead - Enterprise Cloud & AI Platform

New York Life•New York, NY

1d•Hybrid

About The Position

The GCP Platform Lead at New York Life is responsible for designing, building, and operating secure, compliant, and scalable cloud and AI-enabled platforms on Google Cloud Platform (GCP). This role enables application, data, and analytics teams by delivering standardized cloud infrastructure, Kubernetes platforms, and governed access to approved Google AI services. This leader partners closely with Cloud, Data & AI, Information Security, and Risk teams to ensure platforms meet financial services requirements for security, regulatory compliance, resiliency, and operational excellence.

Requirements

8+ years of experience in cloud, platform engineering, or DevOps
Strong expertise in Google Cloud Platform (GCP) architecture and services
Experience designing and implementing Infrastructure as Code (Terraform or similar)
Deep experience with Kubernetes and GKE in enterprise environments
Proficiency in scripting (Python, Bash, or Go)
Strong understanding of cloud security, IAM, and networking
Experience operating in regulated or highly governed environments

Nice To Haves

Experience enabling and scaling Google AI services such as Vertex AI, Gemini APIs, and BigQuery ML
Hands-on experience with LLM-based applications, RAG architectures, and vector databases
Familiarity with MLOps practices (model deployment, monitoring, lifecycle management)
Experience designing AI inference workloads in production environments
Understanding of Responsible AI, data governance, and model risk management
GCP certifications (Professional Cloud Architect, DevOps Engineer; AI certifications a plus)

Responsibilities

Design and deliver secure, scalable GCP architectures with strong governance, data controls, and observability
Build and operate shared cloud platforms supporting both AI and non-AI workloads
Architect infrastructure-as-code (IaC) for cloud, networking, and AI service enablement
Establish platform standards for observability, monitoring, and performance management
Define and implement reference architectures for AI-enabled applications and agentic workflows
Enable and operationalize approved Google AI services, including Vertex AI, Gemini APIs, and BigQuery ML
Design scalable architectures for LLM-based applications, including RAG pipelines and vector search
Establish patterns for multi-step reasoning, orchestration frameworks, and agent-based systems
Define memory strategies (short-term and long-term) for AI agents
Implement evaluation, monitoring, and guardrails for AI systems in production
Architect and operate GKE-based platforms for application and AI inference workloads
Define standardized containerization and deployment patterns using approved base images
Design scalable microservices architectures for AI APIs and services
Enable GPU-based workloads where appropriate and approved
Implement secure CI/CD pipelines aligned with enterprise SDLC standards
Establish MLOps foundations, including model deployment, versioning, promotion, and rollback
Integrate AI workloads into DevOps pipelines with appropriate controls and approvals
Enforce policy-as-code, guardrails, and governance for AI usage
Architect secure-by-design platforms using IAM, workload identity, and least-privilege access
Implement data protection controls including encryption, data residency, and access policies
Ensure compliance with regulatory requirements (e.g., SOC2, SOX, data privacy)
Integrate platform telemetry with enterprise logging, monitoring, and SIEM systems
Support audits, risk assessments, and regulatory reviews
Design highly available, resilient, and fault-tolerant architectures on GCP
Define and enforce SLAs, SLOs, and SLIs across platform services
Implement disaster recovery and business continuity strategies
Establish observability standards (metrics, logs, tracing) and real-time monitoring
Optimize cloud and AI spend through cost controls, budgeting, and usage governance
Lead incident response and root cause analysis efforts
Partner with Data & AI, Information Security, Risk, and application teams to drive platform adoption
Define and promote enterprise standards for cloud and AI platform usage
Provide guidance on responsible and compliant AI adoption
Develop and maintain reference architectures, documentation, and best practices