Observability Operations Engineer

OmegaHiresPhoenix, AZ
5dHybrid

About The Position

Manage and support Linux-based infrastructure and containerized environments (Docker, Kubernetes). Administer and optimize large-scale Elasticsearch clusters (configuration, scaling, performance tuning, troubleshooting). Provide end-to-end system administration support across environments. Perform deep-dive troubleshooting across infrastructure, network, and observability stack components. Support ITSM processes including incident, change, problem management. Manage hardware and software lifecycle activities. Ensure platform stability, high availability, and performance optimization. Collaborate with platform engineering and SRE teams to improve observability maturity. Assist in deployment, upgrades, and operational governance of observability tools. Contribute to automation and operational efficiency improvements.

Requirements

  • Deep knowledge of Linux systems administration
  • Strong hands-on experience with: Containerized environments (Docker), Kubernetes (production environments)
  • Extensive experience in system administration across enterprise environments
  • Strong exposure to ITSM processes and hardware/software lifecycle management
  • Superior troubleshooting and root cause analysis skills
  • Strong knowledge of Elasticsearch architecture, configuration, concepts, and performance tuning
  • Deep familiarity with networking concepts (TCP/IP, DNS, load balancing, firewalls, routing)

Nice To Haves

  • Rancher (preferred but not mandatory)

Responsibilities

  • Manage and support Linux-based infrastructure and containerized environments (Docker, Kubernetes).
  • Administer and optimize large-scale Elasticsearch clusters (configuration, scaling, performance tuning, troubleshooting).
  • Provide end-to-end system administration support across environments.
  • Perform deep-dive troubleshooting across infrastructure, network, and observability stack components.
  • Support ITSM processes including incident, change, problem management.
  • Manage hardware and software lifecycle activities.
  • Ensure platform stability, high availability, and performance optimization.
  • Collaborate with platform engineering and SRE teams to improve observability maturity.
  • Assist in deployment, upgrades, and operational governance of observability tools.
  • Contribute to automation and operational efficiency improvements.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service