Site Reliability Engineer - Sr. Consultant level

Visa•Austin, TX

10h•Hybrid

About The Position

The Senior Consultant Platform Engineer is a senior individual contributor within the SRE / Platform organization, acting as a technical authority for cloud platform reliability, scalability, and architecture. This role is responsible for designing, building, and evolving cloud‑native platforms that support critical workloads, ensuring they are operated according to SRE and cloud‑native best practices. This position requires deep expertise in Azure, with the flexibility to actively contribute to AWS‑based platforms, especially during transitional periods. The Senior Consultant is expected to lead by example through hands‑on delivery, architectural decision‑making, and influence across multiple teams.

Requirements

8 or more years of relevant work experience with a Bachelor Degree or at least 5 years of experience with an Advanced Degree (e.g. Masters, MBA, JD, MD) or 2 years of work experience with a PhD

Nice To Haves

9 or more years of relevant work experience with a Bachelor Degree or 7 or more relevant years of experience with an Advanced Degree (e.g. Masters, MBA, JD, MD) or 3 or more years of experience with a PhD
Strong hands-on experience with: Public Cloud platforms (Azure preferred, AWS) Kubernetes at scale (AKS, EKS, or equivalent) Infrastructure as Code (e.g., Terraform) Service Mesh technologies (e.g., Istio preferred, App Mesh, Linkerd)
Proven background in Platform Engineering or SRE roles, operating and supporting production platforms.
Strong understanding of: Designing cloud architectures for scalability, resilience, security, and cost efficiency Cloud-native containerized micro-services architecture Observability tooling and Golden Signals concepts Incident management concepts and on-call operations
Strong collaboration and communication skills

Responsibilities

Design and evolve cloud platform architectures primarily on Azure, while contributing to AWS‑based environments as needed.
Define and implement scalable, resilient, and secure platform patterns for Kubernetes‑based workloads.
Act as a technical reference for cloud architecture, platform design, and reliability standards.
Own the end‑to-end lifecycle of core platform components, including: Cloud infrastructure primitives Kubernetes clusters and supporting services Networking, ingress, and service discovery
Ensure platforms are resilient by design, applying SRE principles such as capacity planning, fault isolation, graceful degradation, and clear failure modes.
Proactively identify and mitigate reliability and scalability risks.
Drive automation across platform provisioning, configuration, and operations.
Design and maintain Infrastructure‑as‑Code solutions that are reproducible, auditable, and scalable.
Promote automation‑first and GitOps‑aligned approaches to reduce manual effort and operational risk.
Apply and promote SRE practices across platform operations, including: On‑call participation as a senior escalation point Incident response, root cause analysis, and post‑incident reviews Definition and maintenance of operational standards and runbooks
Improve platform operability by simplifying day‑2 operations and reducing Mean Time to Detect (MTTD) and Mean Time to Recover (MTTR).
Collaborate closely with application teams, security, and other platform stakeholders.
Influence technical standards, architectural decisions, and platform best practices across the organization.
Mentor engineers through technical guidance and example, without direct people management responsibility.