Staff Site Reliability Engineer

PismoAustin, TX
1dHybrid

About The Position

The Staff Platform Engineer is an individual contributor within the SRE / Platform organization, responsible for operating, maintaining, and improving cloud‑native platforms that support critical workloads. This role focuses on platform reliability, operational excellence, and automation , ensuring systems are stable, scalable, and well‑run in production. The Staff Platform Engineer works primarily on Azure‑based platforms , while actively contributing to AWS environments as required by current initiatives. This role is execution‑focused, with strong involvement in day‑to‑day platform operations and continuous improvement efforts.

Requirements

  • 5 or more years of relevant work experience with a Bachelors Degree or at least 2 years of work experience with an Advanced degree (e.g. Masters, MBA, JD, MD) or 0 years of work experience with a PhD

Nice To Haves

  • 6 or more years of work experience with a Bachelors Degree or 4 or more years of relevant experience with an Advanced Degree (e.g. Masters, MBA, JD, MD) or up to 3 years of relevant experience with a PhD
  • Strong hands-on experience with:
  • Public Cloud platforms (Azure preferred, and AWS)
  • Kubernetes at scale (AKS, EKS, or equivalent)
  • Infrastructure as Code (e.g., Terraform)
  • Containerized, cloud‑native microservices architectures
  • Background in Platform Engineering, SRE, or DevOps roles, supporting production systems and day‑to‑day platform operations.
  • Strong understanding of:
  • Observability tooling and Golden Signals concepts
  • Incident management concepts and on-call operations
  • Platform reliability, availability, and operational best practices
  • Networking, ingress, and service discovery in cloud and Kubernetes environments
  • Strong collaboration and communication skills

Responsibilities

  • Platform Operations & Reliability Operate and support core platform components, including: Cloud infrastructure primitives Kubernetes clusters and supporting services Networking, ingress, and service discovery
  • Ensure platforms meet reliability and availability expectations through proactive monitoring and maintenance.
  • Identify operational issues and contribute to improvements that reduce instability and recurring incidents.
  • SRE Practices & On‑Call Support Participate in on-call rotations, acting as a responder for platform‑related incidents.
  • Troubleshoot production issues, perform root cause analysis, and contribute to post-incident reviews.
  • Maintain and improve operational runbooks, alerts, and dashboards.
  • Automation & Infrastructure as Code Implement and maintain Infrastructure‑as‑Code for platform resources and environments.
  • Contribute to automation initiatives that reduce manual work and operational toil.
  • Support standardized deployment, upgrade, and rollback processes.
  • Continuous Improvement Assist in simplifying day‑2 operations and improving platform operability.
  • Contribute to efforts that reduce Mean Time to Detect (MTTD) and Mean Time to Recover (MTTR).
  • Follow established platform standards and best practices, providing feedback for improvement.
  • Collaboration Work closely with other platform engineers, SREs, and application teams.
  • Support platform adoption by helping application teams troubleshoot and operate their workloads.
  • Escalate complex issues to senior engineers when needed, while learning from hands-on experience.

Benefits

  • Visa has a comprehensive benefits package for which this position may be eligible that includes Medical, Dental, Vision, 401 (k), FSA/HSA, Life Insurance, Paid Time Off, and Wellness Program.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service