Staff Site Reliability Engineer

Pismo•Austin, TX

1d•Hybrid

About The Position

The Staff Platform Engineer is an individual contributor within the SRE / Platform organization, responsible for operating, maintaining, and improving cloud‑native platforms that support critical workloads. This role focuses on platform reliability, operational excellence, and automation , ensuring systems are stable, scalable, and well‑run in production. The Staff Platform Engineer works primarily on Azure‑based platforms , while actively contributing to AWS environments as required by current initiatives. This role is execution‑focused, with strong involvement in day‑to‑day platform operations and continuous improvement efforts.

Requirements

5 or more years of relevant work experience with a Bachelors Degree or at least 2 years of work experience with an Advanced degree (e.g. Masters, MBA, JD, MD) or 0 years of work experience with a PhD

Nice To Haves

6 or more years of work experience with a Bachelors Degree or 4 or more years of relevant experience with an Advanced Degree (e.g. Masters, MBA, JD, MD) or up to 3 years of relevant experience with a PhD
Strong hands-on experience with:
Public Cloud platforms (Azure preferred, and AWS)
Kubernetes at scale (AKS, EKS, or equivalent)
Infrastructure as Code (e.g., Terraform)
Containerized, cloud‑native microservices architectures
Background in Platform Engineering, SRE, or DevOps roles, supporting production systems and day‑to‑day platform operations.
Strong understanding of:
Observability tooling and Golden Signals concepts
Incident management concepts and on-call operations
Platform reliability, availability, and operational best practices
Networking, ingress, and service discovery in cloud and Kubernetes environments
Strong collaboration and communication skills

Responsibilities

Platform Operations & Reliability Operate and support core platform components, including: Cloud infrastructure primitives Kubernetes clusters and supporting services Networking, ingress, and service discovery
Ensure platforms meet reliability and availability expectations through proactive monitoring and maintenance.
Identify operational issues and contribute to improvements that reduce instability and recurring incidents.
SRE Practices & On‑Call Support Participate in on-call rotations, acting as a responder for platform‑related incidents.
Troubleshoot production issues, perform root cause analysis, and contribute to post-incident reviews.
Maintain and improve operational runbooks, alerts, and dashboards.
Automation & Infrastructure as Code Implement and maintain Infrastructure‑as‑Code for platform resources and environments.
Contribute to automation initiatives that reduce manual work and operational toil.
Support standardized deployment, upgrade, and rollback processes.
Continuous Improvement Assist in simplifying day‑2 operations and improving platform operability.
Contribute to efforts that reduce Mean Time to Detect (MTTD) and Mean Time to Recover (MTTR).
Follow established platform standards and best practices, providing feedback for improvement.
Collaboration Work closely with other platform engineers, SREs, and application teams.
Support platform adoption by helping application teams troubleshoot and operate their workloads.
Escalate complex issues to senior engineers when needed, while learning from hands-on experience.

Benefits

Visa has a comprehensive benefits package for which this position may be eligible that includes Medical, Dental, Vision, 401 (k), FSA/HSA, Life Insurance, Paid Time Off, and Wellness Program.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume