SRE/DevOps Engineer - 66763

Hitachi•Dallas, TX

About The Position

As an SRE Operations Engineer, you will serve as the first line of operational support for monitoring, triaging, and resolving infrastructure and application alerts. This role focuses heavily on incident response, system monitoring, troubleshooting, and escalation management across Kubernetes environments, APIs, cloud platforms, and enterprise applications. The ideal candidate will have foundational experience in IT Operations, NOC, Production Support, SRE, or DevOps environments and be comfortable following runbooks, documenting issues, and collaborating with technical teams during incident resolution. This is an excellent opportunity for candidates looking to grow their cloud and SRE skillset within a structured operational environment.

Requirements

Experience supporting production systems, NOC environments, or IT operations
Exposure to monitoring tools such as Grafana, Datadog, Splunk, or Prometheus
Basic understanding of Kubernetes and containerized environments
Familiarity with Linux commands and troubleshooting fundamentals
Ability to follow operational runbooks and escalation procedures
Strong troubleshooting and analytical skills
Exposure to cloud platforms such as AWS, Azure, or GCP
Basic scripting knowledge in Bash, Python, or PowerShell is a plus
Excellent communication and incident documentation skills
Experience with ticketing or ITSM tools such as ServiceNow or Jira
1–5 years of experience in IT Operations, NOC, Production Support, SRE, or related environments
Foundational understanding of networking, Linux, and cloud-based applications
Strong willingness to learn and grow within Site Reliability Engineering and cloud operations
Ability to work in a fast-paced support environment and manage multiple priorities simultaneously

Nice To Haves

Internship, support center, or operational monitoring experience may be considered

Responsibilities

Serve as the first line of operational support for monitoring, triaging, and resolving infrastructure and application alerts.
Focus on incident response, system monitoring, troubleshooting, and escalation management across Kubernetes environments, APIs, cloud platforms, and enterprise applications.
Follow runbooks, document issues, and collaborate with technical teams during incident resolution.
Grow cloud and SRE skillset within a structured operational environment.

Benefits

Industry-leading benefits, support, and services that look after your holistic health and wellbeing
Flexible arrangements that work for you (role and location dependent)

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume