About The Position

Join Cisco’s Enterprise AI team, the core group enabling Generative AI powered experiences across Cisco. Our mission is to build secure, scalable AI platforms that empower teams to safely develop, deploy, and operationalize AI-powered solutions. We operate at the intersection of applied AI, cloud infrastructure and security - partnering across engineering, security, compliance, and product teams to bring trusted AI to life at an enterprise scale. We are a fast-growing, highly collaborative team of platform engineers, AI engineers, and data scientists who value technical depth, ownership, and pragmatic execution. What makes this team exciting is the opportunity to define how secure Generative AI is built and governed inside a global technology leader. As a Lead SRE, you will own the architectural integrity of our hybrid cloud infrastructure, ensuring our GCP and on-premise Kubernetes environments are resilient and secure. You will set the standard for automation and reliability that enables our AI models to scale globally.

Requirements

  • Bachelor’s Degree in Computer Science, Engineering, or a related field.
  • 7+ years of experience in Cloud/On-prem Operations, SRE, or DevOps.
  • Expert-level proficiency with Terraform, Kubernetes (GKE & On-prem), and Docker.
  • Hands-on expertise with Anthos Service Mesh (ASM), Istio, and Apigee.
  • Deep understanding of IAM implementation and GCP Quota management.

Nice To Haves

  • GCP Professional Cloud Security Engineer or Network Engineer certification.
  • Experience with the ELK stack (Elasticsearch/Kibana) for large-scale observability.
  • Strong financial acumen for cloud cost optimization and proactive budget alerting.
  • Experience managing complex traffic between cloud platforms and on-premise data centers.

Responsibilities

  • Lead the architectural design of scalable hybrid-cloud environments, managing GCP and On-premise Kubernetes clusters with Anthos Service Mesh (ASM) and Istio.
  • Direct the implementation of Identity and Access Management (IAM) policies and GCP Quota management to ensure secure and cost-effective resource utilization.
  • Architect multi-region, load-balanced microservices with DDoS hardening, end-to-end encryption, and automated secrets management.
  • Design a comprehensive observability strategy using Elasticsearch and Kibana to provide proactive alerts on service performance and cost envelope management.
  • Partner with development leads to integrate "Security by Design" into the automation and AI agent lifecycle using Apigee for secure API management.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service