This group is focused on Cloud Ops support, but they are starting to partner more with the peer SRE team, so an engineering history would be helpful as they may be taking on some SRE projects. Responsible for reliability and support of Container Platform on-prem and external clouds (Azure /AWS /Google) Monitor and troubleshoot Container platform (OpenShift), Rancher (RKE) and Azure (AKS) environment performance issues, connectivity issues, security issues, etc. Perform deep dives into systemic and latent reliability issues, Incident management, problem management Identifying, analyzing, and resolving infrastructure vulnerabilities and application deployment issues. Perform blameless RCA, partner with engineering and operation teams across the organization to roll out fixes.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Career Level
Mid Level