About The Position

The GCP Platform Lead at New York Life is responsible for designing, building, and operating secure, compliant, and scalable cloud and AI-enabled platforms on Google Cloud Platform (GCP). This role enables application, data, and analytics teams by delivering standardized cloud infrastructure, Kubernetes platforms, and governed access to approved Google AI services. This leader partners closely with Cloud, Data & AI, Information Security, and Risk teams to ensure platforms meet financial services requirements for security, regulatory compliance, resiliency, and operational excellence.

Requirements

  • 8+ years of experience in cloud, platform engineering, or DevOps
  • Strong expertise in Google Cloud Platform (GCP) architecture and services
  • Experience designing and implementing Infrastructure as Code (Terraform or similar)
  • Deep experience with Kubernetes and GKE in enterprise environments
  • Proficiency in scripting (Python, Bash, or Go)
  • Strong understanding of cloud security, IAM, and networking
  • Experience operating in regulated or highly governed environments

Nice To Haves

  • Experience enabling and scaling Google AI services such as Vertex AI, Gemini APIs, and BigQuery ML
  • Hands-on experience with LLM-based applications, RAG architectures, and vector databases
  • Familiarity with MLOps practices (model deployment, monitoring, lifecycle management)
  • Experience designing AI inference workloads in production environments
  • Understanding of Responsible AI, data governance, and model risk management
  • GCP certifications (Professional Cloud Architect, DevOps Engineer; AI certifications a plus)

Responsibilities

  • Design and deliver secure, scalable GCP architectures with strong governance, data controls, and observability
  • Build and operate shared cloud platforms supporting both AI and non-AI workloads
  • Architect infrastructure-as-code (IaC) for cloud, networking, and AI service enablement
  • Establish platform standards for observability, monitoring, and performance management
  • Define and implement reference architectures for AI-enabled applications and agentic workflows
  • Enable and operationalize approved Google AI services, including Vertex AI, Gemini APIs, and BigQuery ML
  • Design scalable architectures for LLM-based applications, including RAG pipelines and vector search
  • Establish patterns for multi-step reasoning, orchestration frameworks, and agent-based systems
  • Define memory strategies (short-term and long-term) for AI agents
  • Implement evaluation, monitoring, and guardrails for AI systems in production
  • Architect and operate GKE-based platforms for application and AI inference workloads
  • Define standardized containerization and deployment patterns using approved base images
  • Design scalable microservices architectures for AI APIs and services
  • Enable GPU-based workloads where appropriate and approved
  • Implement secure CI/CD pipelines aligned with enterprise SDLC standards
  • Establish MLOps foundations, including model deployment, versioning, promotion, and rollback
  • Integrate AI workloads into DevOps pipelines with appropriate controls and approvals
  • Enforce policy-as-code, guardrails, and governance for AI usage
  • Architect secure-by-design platforms using IAM, workload identity, and least-privilege access
  • Implement data protection controls including encryption, data residency, and access policies
  • Ensure compliance with regulatory requirements (e.g., SOC2, SOX, data privacy)
  • Integrate platform telemetry with enterprise logging, monitoring, and SIEM systems
  • Support audits, risk assessments, and regulatory reviews
  • Design highly available, resilient, and fault-tolerant architectures on GCP
  • Define and enforce SLAs, SLOs, and SLIs across platform services
  • Implement disaster recovery and business continuity strategies
  • Establish observability standards (metrics, logs, tracing) and real-time monitoring
  • Optimize cloud and AI spend through cost controls, budgeting, and usage governance
  • Lead incident response and root cause analysis efforts
  • Partner with Data & AI, Information Security, Risk, and application teams to drive platform adoption
  • Define and promote enterprise standards for cloud and AI platform usage
  • Provide guidance on responsible and compliant AI adoption
  • Develop and maintain reference architectures, documentation, and best practices

Benefits

  • leave programs
  • adoption assistance
  • student loan repayment programs
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service