Site Reliability Engineer

TEKsystemsChandler, AZ
4d$60 - $63Hybrid

About The Position

Cloud Site Reliability Engineer (SRE) for Internal Cloud. Maintain services once they are live by measuring and monitoring availability, latency and overall system health. Troubleshoot issues across the entire stack: hardware, software, application and network Perform deep dives into both systemic and latent reliability issues; partner with engineering and operation teams across the organization to produce and roll out fixes. Drive standardization efforts across multiple disciplines and services in conjunction with embedded SREs throughout the organization. Identify and drive opportunities to improve automation for the cloud services Scope and create automation for deployment, management and visibility of our services Troubleshoot issues across the entire stack: hardware, software, application and network Perform deep dives into both systemic and latent reliability issues; partner with engineering and operation teams across the organization to produce and roll out fixes. Identify and drive opportunities to improve automation for the cloud services

Requirements

  • Cloud
  • unix
  • linux
  • terraform
  • java
  • python
  • ansible
  • shell

Nice To Haves

  • Experience at a large, highly regulated company.
  • Ideally financial services experience.

Responsibilities

  • Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
  • Troubleshoot issues across the entire stack: hardware, software, application and network
  • Perform deep dives into both systemic and latent reliability issues; partner with engineering and operation teams across the organization to produce and roll out fixes.
  • Drive standardization efforts across multiple disciplines and services in conjunction with embedded SREs throughout the organization.
  • Identify and drive opportunities to improve automation for the cloud services
  • Scope and create automation for deployment, management and visibility of our services
  • Troubleshoot issues across the entire stack: hardware, software, application and network
  • Perform deep dives into both systemic and latent reliability issues; partner with engineering and operation teams across the organization to produce and roll out fixes.
  • Identify and drive opportunities to improve automation for the cloud services

Benefits

  • Medical, dental & vision
  • Critical Illness, Accident, and Hospital
  • 401(k) Retirement Plan – Pre-tax and Roth post-tax contributions available
  • Life Insurance (Voluntary Life & AD&D for the employee and dependents)
  • Short and long-term disability
  • Health Spending Account (HSA)
  • Transportation benefits
  • Employee Assistance Program
  • Time Off/Leave (PTO, Vacation or Sick Leave)
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service