We are a global leader in online protection, dedicated to making the digital world a safer place. We are seeking a highly experienced and hands-on Principal Infrastructure Architect with a deep background in large-scale multi-cloud environments (AWS, GCP, Azure) and modern SaaS delivery. This is a unique opportunity to lead the architectural evolution of our platform, driving a critical migration from legacy EC2 topologies to cloud-native EKS/Kubernetes clusters, and designing the backbone for our next-generation AI and real-time data services. We highly value experience gained at FAANG or other leading Big Tech companies. This is a Hybrid position located at either or San Jose or Newport Beach, CA offices. You will be required to be on-site 2 to 3 days per week. When you are not working on-site, you will be working from your home office. We are only considering candidates within a commutable distance to either San Jose or Newport Beach, CA offices and are not offering relocation assistance at this time About the Role: Cloud Native Strategy & Migration: Lead the architectural design and execution of migrating legacy EC2-based workloads to Amazon EKS and Kubernetes. Define standards for multi-region availability, auto-scaling, and spot instance orchestration. Advanced Traffic Management: Architect and deploy high-performance API Gateways and specialized LLM Gateways to manage traffic for Generative AI workloads. Implement Service Mesh (e.g., Istio, Linkerd) for advanced traffic splitting, mTLS, and observability. Real-Time Data Infrastructure: Design robust infrastructure for diverse storage engines, including AWS-native databases (DynamoDB, Aurora), OLAP systems, and real-time databases like Aerospike and Druid to support sub-millisecond latency requirements. Event-Driven Backbone: Architect scalable Pub/Sub messaging systems (Kafka, SNS/SQS, Pulsar) to decouple microservices and enable event-driven architectures at internet scale. Comprehensive Observability: Define and implement a unified observability strategy based on OpenTelemetry (OTLP) standards. Integrate platforms like Grafana, Datadog, and Graylog to provide "single pane of glass" visibility into logs, metrics, and traces. Identity & Security Engineering: Modernize Authentication and Authorization systems (OIDC, OAuth2, SPIFFE/SPIRE). Deploy and manage centralized Secret Stores (HashiCorp Vault, AWS Secrets Manager), security gateways, and automated certificate management systems. Infrastructure as Code (IaC): Champion a "GitOps" culture by treating infrastructure as software. Enforce best practices using Terraform, Crossplane, or Pulumi, ensuring all environments are reproducible and audit-compliant. Technical Leadership: Mentor senior infrastructure engineers, drive "Well-Architected" reviews, and collaborate with software teams to ensure infrastructure supports rapid product iteration. About You: 10+ years of professional experience in infrastructure engineering and architecture, with a proven track record of managing large-scale cloud deployments in AWS (primary), GCP, or Azure. Kubernetes Expert: Deep, hands-on mastery of Kubernetes and EKS, including experience with custom controllers, operators, and migrating stateful/stateless workloads from VMs to containers. Database Reliability: Strong experience architecting infrastructure for high-scale data systems, specifically real-time stores (Aerospike, Redis) and analytics engines (Druid, ClickHouse). Security & Compliance: Extensive experience designing secure infrastructure, including implementation of Zero Trust networks, WAFs, and secret management systems. Familiarity with compliance standards (SOC2, PCI-DSS) is a plus. Observability Stack: Proven ability to build observability pipelines from scratch using OpenTelemetry collectors and backend visualization tools (Prometheus/Grafana/Datadog). Automation & Scripting: Expert-level proficiency in Go, Python, or Bash. You automate toil relentlessly and have deep experience with CI/CD pipelines (GitLab CI, GitHub Actions, ArgoCD). Networking Fundamentals: Deep understanding of cloud networking (VPC, Transit Gateways, Direct Connect) and protocols (gRPC, HTTP/2, WebSocket, QUIC).
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Principal
Education Level
No Education Listed