About The Position

We are a global leader in online protection, dedicated to making the digital world a safer place. We are seeking a highly experienced and hands-on Principal Infrastructure Architect with a deep background in large-scale multi-cloud environments (AWS, GCP, Azure) and modern SaaS delivery. This is a unique opportunity to lead the architectural evolution of our platform, driving a critical migration from legacy EC2 topologies to cloud-native EKS/Kubernetes clusters, and designing the backbone for our next-generation AI and real-time data services. We highly value experience gained at FAANG or other leading Big Tech companies. This is a Hybrid position located at either or San Jose or Newport Beach, CA offices. You will be required to be on-site 2 to 3 days per week. When you are not working on-site, you will be working from your home office. We are only considering candidates within a commutable distance to either San Jose or Newport Beach, CA offices and are not offering relocation assistance at this time About the Role: Cloud Native Strategy & Migration: Lead the architectural design and execution of migrating legacy EC2-based workloads to Amazon EKS and Kubernetes. Define standards for multi-region availability, auto-scaling, and spot instance orchestration. Advanced Traffic Management: Architect and deploy high-performance API Gateways and specialized LLM Gateways to manage traffic for Generative AI workloads. Implement Service Mesh (e.g., Istio, Linkerd) for advanced traffic splitting, mTLS, and observability. Real-Time Data Infrastructure: Design robust infrastructure for diverse storage engines, including AWS-native databases (DynamoDB, Aurora), OLAP systems, and real-time databases like Aerospike and Druid to support sub-millisecond latency requirements. Event-Driven Backbone: Architect scalable Pub/Sub messaging systems (Kafka, SNS/SQS, Pulsar) to decouple microservices and enable event-driven architectures at internet scale. Comprehensive Observability: Define and implement a unified observability strategy based on OpenTelemetry (OTLP) standards. Integrate platforms like Grafana, Datadog, and Graylog to provide "single pane of glass" visibility into logs, metrics, and traces. Identity & Security Engineering: Modernize Authentication and Authorization systems (OIDC, OAuth2, SPIFFE/SPIRE). Deploy and manage centralized Secret Stores (HashiCorp Vault, AWS Secrets Manager), security gateways, and automated certificate management systems. Infrastructure as Code (IaC): Champion a "GitOps" culture by treating infrastructure as software. Enforce best practices using Terraform, Crossplane, or Pulumi, ensuring all environments are reproducible and audit-compliant. Technical Leadership: Mentor senior infrastructure engineers, drive "Well-Architected" reviews, and collaborate with software teams to ensure infrastructure supports rapid product iteration. About You: 10+ years of professional experience in infrastructure engineering and architecture, with a proven track record of managing large-scale cloud deployments in AWS (primary), GCP, or Azure. Kubernetes Expert: Deep, hands-on mastery of Kubernetes and EKS, including experience with custom controllers, operators, and migrating stateful/stateless workloads from VMs to containers. Database Reliability: Strong experience architecting infrastructure for high-scale data systems, specifically real-time stores (Aerospike, Redis) and analytics engines (Druid, ClickHouse). Security & Compliance: Extensive experience designing secure infrastructure, including implementation of Zero Trust networks, WAFs, and secret management systems. Familiarity with compliance standards (SOC2, PCI-DSS) is a plus. Observability Stack: Proven ability to build observability pipelines from scratch using OpenTelemetry collectors and backend visualization tools (Prometheus/Grafana/Datadog). Automation & Scripting: Expert-level proficiency in Go, Python, or Bash. You automate toil relentlessly and have deep experience with CI/CD pipelines (GitLab CI, GitHub Actions, ArgoCD). Networking Fundamentals: Deep understanding of cloud networking (VPC, Transit Gateways, Direct Connect) and protocols (gRPC, HTTP/2, WebSocket, QUIC).

Requirements

  • 10+ years of professional experience in infrastructure engineering and architecture, with a proven track record of managing large-scale cloud deployments in AWS (primary), GCP, or Azure.
  • Kubernetes Expert: Deep, hands-on mastery of Kubernetes and EKS, including experience with custom controllers, operators, and migrating stateful/stateless workloads from VMs to containers.
  • Database Reliability: Strong experience architecting infrastructure for high-scale data systems, specifically real-time stores (Aerospike, Redis) and analytics engines (Druid, ClickHouse).
  • Security & Compliance: Extensive experience designing secure infrastructure, including implementation of Zero Trust networks, WAFs, and secret management systems.
  • Observability Stack: Proven ability to build observability pipelines from scratch using OpenTelemetry collectors and backend visualization tools (Prometheus/Grafana/Datadog).
  • Automation & Scripting: Expert-level proficiency in Go, Python, or Bash. You automate toil relentlessly and have deep experience with CI/CD pipelines (GitLab CI, GitHub Actions, ArgoCD).
  • Networking Fundamentals: Deep understanding of cloud networking (VPC, Transit Gateways, Direct Connect) and protocols (gRPC, HTTP/2, WebSocket, QUIC).

Nice To Haves

  • Familiarity with compliance standards (SOC2, PCI-DSS) is a plus.

Responsibilities

  • Lead the architectural design and execution of migrating legacy EC2-based workloads to Amazon EKS and Kubernetes.
  • Define standards for multi-region availability, auto-scaling, and spot instance orchestration.
  • Architect and deploy high-performance API Gateways and specialized LLM Gateways to manage traffic for Generative AI workloads.
  • Implement Service Mesh (e.g., Istio, Linkerd) for advanced traffic splitting, mTLS, and observability.
  • Design robust infrastructure for diverse storage engines, including AWS-native databases (DynamoDB, Aurora), OLAP systems, and real-time databases like Aerospike and Druid to support sub-millisecond latency requirements.
  • Architect scalable Pub/Sub messaging systems (Kafka, SNS/SQS, Pulsar) to decouple microservices and enable event-driven architectures at internet scale.
  • Define and implement a unified observability strategy based on OpenTelemetry (OTLP) standards.
  • Integrate platforms like Grafana, Datadog, and Graylog to provide "single pane of glass" visibility into logs, metrics, and traces.
  • Modernize Authentication and Authorization systems (OIDC, OAuth2, SPIFFE/SPIRE).
  • Deploy and manage centralized Secret Stores (HashiCorp Vault, AWS Secrets Manager), security gateways, and automated certificate management systems.
  • Champion a "GitOps" culture by treating infrastructure as software.
  • Enforce best practices using Terraform, Crossplane, or Pulumi, ensuring all environments are reproducible and audit-compliant.
  • Mentor senior infrastructure engineers, drive "Well-Architected" reviews, and collaborate with software teams to ensure infrastructure supports rapid product iteration.

Benefits

  • Bonus Program
  • 401k Retirement Plan
  • Medical, Dental, Vision, Basic Life, Short Term Disability and Long-Term Disability Coverage
  • Paid Parental Leave
  • Support for Community Involvement
  • 14 Paid Company Holidays
  • Unlimited Paid Time Off for Exempt Employees
  • 96 Hours of Sick Time and 120 Hours of Vacation for Non-Exempt Employees Accrued Each Year
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service