Observability Operations Engineer

OmegaHires•Phoenix, AZ

5d•Hybrid

About The Position

Manage and support Linux-based infrastructure and containerized environments (Docker, Kubernetes). Administer and optimize large-scale Elasticsearch clusters (configuration, scaling, performance tuning, troubleshooting). Provide end-to-end system administration support across environments. Perform deep-dive troubleshooting across infrastructure, network, and observability stack components. Support ITSM processes including incident, change, problem management. Manage hardware and software lifecycle activities. Ensure platform stability, high availability, and performance optimization. Collaborate with platform engineering and SRE teams to improve observability maturity. Assist in deployment, upgrades, and operational governance of observability tools. Contribute to automation and operational efficiency improvements.

Requirements

Deep knowledge of Linux systems administration
Strong hands-on experience with: Containerized environments (Docker), Kubernetes (production environments)
Extensive experience in system administration across enterprise environments
Strong exposure to ITSM processes and hardware/software lifecycle management
Superior troubleshooting and root cause analysis skills
Strong knowledge of Elasticsearch architecture, configuration, concepts, and performance tuning
Deep familiarity with networking concepts (TCP/IP, DNS, load balancing, firewalls, routing)

Nice To Haves

Rancher (preferred but not mandatory)

Responsibilities

Manage and support Linux-based infrastructure and containerized environments (Docker, Kubernetes).
Administer and optimize large-scale Elasticsearch clusters (configuration, scaling, performance tuning, troubleshooting).
Provide end-to-end system administration support across environments.
Perform deep-dive troubleshooting across infrastructure, network, and observability stack components.
Support ITSM processes including incident, change, problem management.
Manage hardware and software lifecycle activities.
Ensure platform stability, high availability, and performance optimization.
Collaborate with platform engineering and SRE teams to improve observability maturity.
Assist in deployment, upgrades, and operational governance of observability tools.
Contribute to automation and operational efficiency improvements.