Principal Infrastructure Architect – Cloud & SaaS Platforms

McAfee•San Jose, CA

21h•Hybrid

About The Position

We are a global leader in online protection, dedicated to making the digital world a safer place. We are seeking a highly experienced and hands-on Principal Infrastructure Architect with a deep background in large-scale multi-cloud environments (AWS, GCP, Azure) and modern SaaS delivery. This is a unique opportunity to lead the architectural evolution of our platform, driving a critical migration from legacy EC2 topologies to cloud-native EKS/Kubernetes clusters, and designing the backbone for our next-generation AI and real-time data services. We highly value experience gained at FAANG or other leading Big Tech companies. This is a Hybrid position located at either or San Jose or Newport Beach, CA offices. You will be required to be on-site 2 to 3 days per week. When you are not working on-site, you will be working from your home office. We are only considering candidates within a commutable distance to either San Jose or Newport Beach, CA offices and are not offering relocation assistance at this time About the Role: Cloud Native Strategy & Migration: Lead the architectural design and execution of migrating legacy EC2-based workloads to Amazon EKS and Kubernetes. Define standards for multi-region availability, auto-scaling, and spot instance orchestration. Advanced Traffic Management: Architect and deploy high-performance API Gateways and specialized LLM Gateways to manage traffic for Generative AI workloads. Implement Service Mesh (e.g., Istio, Linkerd) for advanced traffic splitting, mTLS, and observability. Real-Time Data Infrastructure: Design robust infrastructure for diverse storage engines, including AWS-native databases (DynamoDB, Aurora), OLAP systems, and real-time databases like Aerospike and Druid to support sub-millisecond latency requirements. Event-Driven Backbone: Architect scalable Pub/Sub messaging systems (Kafka, SNS/SQS, Pulsar) to decouple microservices and enable event-driven architectures at internet scale. Comprehensive Observability: Define and implement a unified observability strategy based on OpenTelemetry (OTLP) standards. Integrate platforms like Grafana, Datadog, and Graylog to provide "single pane of glass" visibility into logs, metrics, and traces. Identity & Security Engineering: Modernize Authentication and Authorization systems (OIDC, OAuth2, SPIFFE/SPIRE). Deploy and manage centralized Secret Stores (HashiCorp Vault, AWS Secrets Manager), security gateways, and automated certificate management systems. Infrastructure as Code (IaC): Champion a "GitOps" culture by treating infrastructure as software. Enforce best practices using Terraform, Crossplane, or Pulumi, ensuring all environments are reproducible and audit-compliant. Technical Leadership: Mentor senior infrastructure engineers, drive "Well-Architected" reviews, and collaborate with software teams to ensure infrastructure supports rapid product iteration. About You: 10+ years of professional experience in infrastructure engineering and architecture, with a proven track record of managing large-scale cloud deployments in AWS (primary), GCP, or Azure. Kubernetes Expert: Deep, hands-on mastery of Kubernetes and EKS, including experience with custom controllers, operators, and migrating stateful/stateless workloads from VMs to containers. Database Reliability: Strong experience architecting infrastructure for high-scale data systems, specifically real-time stores (Aerospike, Redis) and analytics engines (Druid, ClickHouse). Security & Compliance: Extensive experience designing secure infrastructure, including implementation of Zero Trust networks, WAFs, and secret management systems. Familiarity with compliance standards (SOC2, PCI-DSS) is a plus. Observability Stack: Proven ability to build observability pipelines from scratch using OpenTelemetry collectors and backend visualization tools (Prometheus/Grafana/Datadog). Automation & Scripting: Expert-level proficiency in Go, Python, or Bash. You automate toil relentlessly and have deep experience with CI/CD pipelines (GitLab CI, GitHub Actions, ArgoCD). Networking Fundamentals: Deep understanding of cloud networking (VPC, Transit Gateways, Direct Connect) and protocols (gRPC, HTTP/2, WebSocket, QUIC).

Requirements

10+ years of professional experience in infrastructure engineering and architecture, with a proven track record of managing large-scale cloud deployments in AWS (primary), GCP, or Azure.
Kubernetes Expert: Deep, hands-on mastery of Kubernetes and EKS, including experience with custom controllers, operators, and migrating stateful/stateless workloads from VMs to containers.
Database Reliability: Strong experience architecting infrastructure for high-scale data systems, specifically real-time stores (Aerospike, Redis) and analytics engines (Druid, ClickHouse).
Security & Compliance: Extensive experience designing secure infrastructure, including implementation of Zero Trust networks, WAFs, and secret management systems.
Observability Stack: Proven ability to build observability pipelines from scratch using OpenTelemetry collectors and backend visualization tools (Prometheus/Grafana/Datadog).
Automation & Scripting: Expert-level proficiency in Go, Python, or Bash. You automate toil relentlessly and have deep experience with CI/CD pipelines (GitLab CI, GitHub Actions, ArgoCD).
Networking Fundamentals: Deep understanding of cloud networking (VPC, Transit Gateways, Direct Connect) and protocols (gRPC, HTTP/2, WebSocket, QUIC).

Nice To Haves

Familiarity with compliance standards (SOC2, PCI-DSS) is a plus.

Responsibilities

Lead the architectural design and execution of migrating legacy EC2-based workloads to Amazon EKS and Kubernetes.
Define standards for multi-region availability, auto-scaling, and spot instance orchestration.
Architect and deploy high-performance API Gateways and specialized LLM Gateways to manage traffic for Generative AI workloads.
Implement Service Mesh (e.g., Istio, Linkerd) for advanced traffic splitting, mTLS, and observability.
Design robust infrastructure for diverse storage engines, including AWS-native databases (DynamoDB, Aurora), OLAP systems, and real-time databases like Aerospike and Druid to support sub-millisecond latency requirements.
Architect scalable Pub/Sub messaging systems (Kafka, SNS/SQS, Pulsar) to decouple microservices and enable event-driven architectures at internet scale.
Define and implement a unified observability strategy based on OpenTelemetry (OTLP) standards.
Integrate platforms like Grafana, Datadog, and Graylog to provide "single pane of glass" visibility into logs, metrics, and traces.
Modernize Authentication and Authorization systems (OIDC, OAuth2, SPIFFE/SPIRE).
Deploy and manage centralized Secret Stores (HashiCorp Vault, AWS Secrets Manager), security gateways, and automated certificate management systems.
Champion a "GitOps" culture by treating infrastructure as software.
Enforce best practices using Terraform, Crossplane, or Pulumi, ensuring all environments are reproducible and audit-compliant.
Mentor senior infrastructure engineers, drive "Well-Architected" reviews, and collaborate with software teams to ensure infrastructure supports rapid product iteration.

Benefits

Bonus Program
401k Retirement Plan
Medical, Dental, Vision, Basic Life, Short Term Disability and Long-Term Disability Coverage
Paid Parental Leave
Support for Community Involvement
14 Paid Company Holidays
Unlimited Paid Time Off for Exempt Employees
96 Hours of Sick Time and 120 Hours of Vacation for Non-Exempt Employees Accrued Each Year

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume