SRE/DevOps Engineer - 66763

HitachiDallas, TX

About The Position

As an SRE Operations Engineer, you will serve as the first line of operational support for monitoring, triaging, and resolving infrastructure and application alerts. This role focuses heavily on incident response, system monitoring, troubleshooting, and escalation management across Kubernetes environments, APIs, cloud platforms, and enterprise applications. The ideal candidate will have foundational experience in IT Operations, NOC, Production Support, SRE, or DevOps environments and be comfortable following runbooks, documenting issues, and collaborating with technical teams during incident resolution. This is an excellent opportunity for candidates looking to grow their cloud and SRE skillset within a structured operational environment.

Requirements

  • Experience supporting production systems, NOC environments, or IT operations
  • Exposure to monitoring tools such as Grafana, Datadog, Splunk, or Prometheus
  • Basic understanding of Kubernetes and containerized environments
  • Familiarity with Linux commands and troubleshooting fundamentals
  • Ability to follow operational runbooks and escalation procedures
  • Strong troubleshooting and analytical skills
  • Exposure to cloud platforms such as AWS, Azure, or GCP
  • Basic scripting knowledge in Bash, Python, or PowerShell is a plus
  • Excellent communication and incident documentation skills
  • Experience with ticketing or ITSM tools such as ServiceNow or Jira
  • 1–5 years of experience in IT Operations, NOC, Production Support, SRE, or related environments
  • Foundational understanding of networking, Linux, and cloud-based applications
  • Strong willingness to learn and grow within Site Reliability Engineering and cloud operations
  • Ability to work in a fast-paced support environment and manage multiple priorities simultaneously

Nice To Haves

  • Internship, support center, or operational monitoring experience may be considered

Responsibilities

  • Serve as the first line of operational support for monitoring, triaging, and resolving infrastructure and application alerts.
  • Focus on incident response, system monitoring, troubleshooting, and escalation management across Kubernetes environments, APIs, cloud platforms, and enterprise applications.
  • Follow runbooks, document issues, and collaborate with technical teams during incident resolution.
  • Grow cloud and SRE skillset within a structured operational environment.

Benefits

  • Industry-leading benefits, support, and services that look after your holistic health and wellbeing
  • Flexible arrangements that work for you (role and location dependent)
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service