Infra Apps SRE

ZoomSan Jose, CA
2d$124,000 - $271,200Hybrid

About The Position

The ideal candidate will possess a solid SRE & Automation mindset to transform our operations from reactive ticketing to proactive engineering. You will not just resolve issues; you will code the solutions that prevent them. You can expect to work on a mix of Identity/Auth systems, Cloud Infrastructure, and Observability. Support product teams and AI by building IaC for Identity, Directory Services, and SaaS, while maintaining a unified “Single Pane of Glass” monitoring stack. We are the engineering engine behind the organization's internal technology. As a modern infrastructure team, we serve as "Customer Zero" for our products, optimizing our cloud footprint while ensuring security and efficiency. We operate with a DevOps/SRE mindset to support a diverse landscape panning AWS, Azure, GCP, and critical SaaS platforms like Okta and Zoom

Requirements

  • Have 5+ years of experience.
  • BS /MS in Computer Science, MIS, or equivalent engineering experience.
  • Hold extensive experience creating tools and automation services with Python or Golang.
  • Possess a solid background in automating infrastructure using standard tools like Terraform, Pulumi, or Ansible.
  • Own good understanding of authentication security practices (IAM/IGA, MFA, SSO, PKI) and experience managing Okta or similar Identity Providers.
  • Be experienced administering major cloud platforms (AWS is primary, familiarity with Azure/GCP) an Kubernetes internals.
  • Have proven ability to implement and tune monitoring stacks (Prometheus, Grafana, ELK/Splunk) to diagnose complex technical issues.
  • Have experience building pipelines with GitHub Actions/Jenkins and managing deployments via ArgoCD.
  • Be able to diagnose resolve complex technical issues from the physical/network layer up to the application/session level.

Responsibilities

  • Architecting and maintaining secure infrastructure across AWS, Azure, and GCP.
  • Manage Identity and Access Management (IAM) flows, integrating Okta, Active Directory, and SaaS applications (Zoom, Google Workspace) to ensure seamless, secure access.
  • Developing and maintaining Terraform modules and Ansible playbooks to automate the provisioning of cloud resources, Kubernetes (EKS) clusters, and configuration management.
  • Managing and scale our hybrid monitoring stack.
  • Configuring Prometheus, Loki, and Grafana for infrastructure metrics.
  • Driving the adoption of ArgoCD to manage infrastructure deployments, ensuring that Git remains the single source of truth for our configurations.
  • Overseeing the deployment and management of critical SaaS applications, ensuring integration with our Unified Collaboration tools (Zoom, Google, Proofpoint).
  • Identifying and resolving complex issues across the network, application, and identity layers. You will lead blameless post-mortems and define SLOs to ensure compliance with service-level agreements.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service