Ops Support (Cloud) / SRE

TEKsystems•Irving, TX

1d•$55 - $75•Hybrid

About The Position

This group is focused on Cloud Ops support, but they are starting to partner more with the peer SRE team, so an engineering history would be helpful as they may be taking on some SRE projects. Responsible for reliability and support of Container Platform on-prem and external clouds (Azure /AWS /Google) Monitor and troubleshoot Container platform (OpenShift), Rancher (RKE) and Azure (AKS) environment performance issues, connectivity issues, security issues, etc. Perform deep dives into systemic and latent reliability issues, Incident management, problem management Identifying, analyzing, and resolving infrastructure vulnerabilities and application deployment issues. Perform blameless RCA, partner with engineering and operation teams across the organization to roll out fixes.

Requirements

BS /MS degree in Computer Science or related technical field involving systems or equivalent practical experience.
Minimum 5+ years of hands-on experience supporting Kubernetes /Openshift / RKE / EKS Container platform.
Experience with Python, Ansible, Golang, and shell scripting
Strong experience in major services related to Compute, Storage, Network and Security
Experience with monitoring tools like Prometheus and Dynatrace, as well as cloud native tools like Azure Monitor and Log Analytics
Strong understanding of working with a complex IAM infrastructure, including Active Directory, Azure AD Connect, Azure AD, and Ping Identity or other SSO solutions.
Advanced knowledge of Linux OS, DNS, DHCP, Kerberos and Windows Authentication
Experience with CI/CD tools git /Jenkins, GitOps model
Excellent understanding of Linux /Windows operating systems administration
Experience in Container security and vulnerability remediation.
Systematic problem-solving approach, sense of ownership and drive
Ability to juggle competing priorities and adapt to changes in project scope.
Excellent interpersonal, organizational and communication (written, verbal, and presentation) skills are a must.
OpenShift
Kubernetes
Azure
Terraform
Linux

Nice To Haves

Kubernetes /Openshift /Terraform certifications are a plus

Responsibilities

Responsible for reliability and support of Container Platform on-prem and external clouds (Azure /AWS /Google)
Monitor and troubleshoot Container platform (OpenShift), Rancher (RKE) and Azure (AKS) environment performance issues, connectivity issues, security issues, etc.
Perform deep dives into systemic and latent reliability issues, Incident management, problem management
Identifying, analyzing, and resolving infrastructure vulnerabilities and application deployment issues.
Perform blameless RCA, partner with engineering and operation teams across the organization to roll out fixes.