Site Reliability Engineer I

GM Financial•Arlington, TX

48d•Hybrid

About The Position

Why GMF Technology? At GM Financial, innovation drives everything we do. We’re not just adopting technology — we’re shaping the future of software delivery. From generative AI and cloud-native platforms to advanced release engineering practices, our teams are redefining how financial technology operates. This role is central to that transformation, influencing how we build, release, and scale software globally. Join us and discover a workplace where your ideas matter, your development is prioritized, and you can truly make a global impact. About The Role: The Site Reliability Engineer under the general direction from the leadership will assist in the day-to-day tasks critical to the team's success. The position will be responsible for supporting cloud infrastructure architecture and components, including hybrid cloud and Public Cloud platforms. This will include prototyping, initiating, and operationalizing of Public Cloud solutions. The role will also be supportive of overall Cloud Transformation initiatives designed to meet key goals in creating a service-driven culture through performance and delivery of SaaS, PaaS, and IaaS solutions by public cloud vendors such as Azure and AWS. The Site Reliability Engineer will be responsible for configuration, efficiency, and performance of the deployed public cloud solutions. The scope of the role includes not only cloud engineering, but advanced level automation capabilities, and even some overlap into software development disciplines. Build and demonstrate a foundational understanding of SRE concepts, including observability, monitoring, incident response, and the core systems owned by the team. Execute standard operational tasks independently using established processes, runbooks, tooling, and escalation paths; raise issues when scenarios become complex or unfamiliar. Perform initial troubleshooting for clear production or environment issues with limited guidance; contribute findings and next steps to the broader resolution effort. Demonstrate ownership of learning by seeking mentorship, asking questions, and contributing back to shared team knowledge. Help teams apply SRE operational readiness practices using the SRE Checklist—with emphasis on detection/observability, performance, resiliency, automation, and operational readiness before go‑live. Assist with defining and implementing basic monitoring coverage aligned to Golden Signals (e.g., latency, traffic, errors, saturation/capacity) and validate telemetry appears correctly in monitoring platforms. Follow established standards for cloud based resources in Azure environment for automation and troubleshooting. Support logging and exception-handling hygiene by aligning to known standards (e.g., ensuring correlation IDs and key dimensions are captured where required). Assist and provide systems administration setup/configuration as needed for supported services and environments. Contribute to toil reduction by helping implement/maintain repeatable operational mechanisms (e.g., health checks/probes and monitoring configuration) as defined in standards and patterns.

Requirements

Thorough command of both the Windows and Linux Operating Systems, with strong background in troubleshooting either
Knowledge of native Kubernetes or related enterprise container platforms such as Open Shift
Good understanding of the mechanics of this platform and the deployment pipeline that feeds it
Knowledge of Public Cloud Governance frameworks, architectures, configurations, services, and solutions, specifically within Microsoft Azure, but may also include AWS and GCP
Knowledge in core Azure services like Azure Kubernetes Service, CosmoDB, Azure Functions, Azure Storage Entities and Concepts, Azure CLI and Powershell Cmdlets
Knowledge in Azure organizational entities such as Departments, Accounts, Subscriptions, Resource Groups and Management Groups
Strong automation skills in Linux and Windows including bash, python, and Powershell
Extensive experience with Terraform plans and associated development
Knowledge of Arm Templates and various related automation methods within Azure
Experience with modern source control repositories (e. g. Git) and devOps toolsets (Jenkins/ Ansible etc) and familiarity with Agile/ Scrum methodologies
Experience with cloud-native and microservice architectures and an understanding of design principles for scalability, performance, and reliability
Experience with distributed systems, asynchronous messaging, and networking protocols
Experience with open source applications, frameworks, and libraries
Fast learner; proactive thinker
Ability to innovate, automate, and continually improve processes
Excellent verbal and written communication skills
Possess critical thinking and analytical skills
Capacity to take initiative; desire to become a self-starter
Willingness to find problems and come up with creative solutions
Ability to balance priorities in order to meet multiple requirements and deadlines while ensuring priority objectives receive proper emphasis
Ability to accept change and adapt to shifting priorities
Effective time management and prioritization skills
Able to think and react positively and professionally when faced with obstacles
A strong willingness to learn, and accept instruction
High School Diploma or equivalent required
Bachelor’s Degree in related field or equivalent work experience within the IT field required
3-5 years of experience in cloud computing, DevOps, and all related automation disciplines preferred

Nice To Haves

Advanced job related certifications preferred but not required
Exposure to Golden Signals–based monitoring (latency, traffic, errors, saturation) and the discipline of validating telemetry and alert behavior preferred
Exposure to reliability engineering concepts such as SLOs/SLIs and how reliability goals connect to real production operations preferred
Familiarity with cloud and runtime fundamentals (e.g., Windows/Linux basics and cloud platform exposure such as Azure) preferred
Familiarity with modern engineering ways of working that support reliability outcomes (e.g., documentation habits, continuous improvement mindset in a DevOps culture) preferred
.net Coding preferred

Responsibilities

Assist in the day-to-day tasks critical to the team's success.
Support cloud infrastructure architecture and components, including hybrid cloud and Public Cloud platforms.
Prototype, initiate, and operationalize Public Cloud solutions.
Support overall Cloud Transformation initiatives.
Ensure performance and delivery of SaaS, PaaS, and IaaS solutions by public cloud vendors such as Azure and AWS.
Responsible for configuration, efficiency, and performance of deployed public cloud solutions.
Engage in cloud engineering, advanced automation capabilities, and software development disciplines.
Build and demonstrate a foundational understanding of SRE concepts, including observability, monitoring, incident response, and core systems.
Execute standard operational tasks independently using established processes, runbooks, tooling, and escalation paths.
Raise issues when scenarios become complex or unfamiliar.
Perform initial troubleshooting for production or environment issues with limited guidance.
Contribute findings and next steps to the broader resolution effort.
Demonstrate ownership of learning by seeking mentorship, asking questions, and contributing back to shared team knowledge.
Help teams apply SRE operational readiness practices using the SRE Checklist.
Assist with defining and implementing basic monitoring coverage aligned to Golden Signals.
Validate telemetry appears correctly in monitoring platforms.
Follow established standards for cloud-based resources in Azure environment for automation and troubleshooting.
Support logging and exception-handling hygiene by aligning to known standards.
Assist and provide systems administration setup/configuration as needed for supported services and environments.
Contribute to toil reduction by helping implement/maintain repeatable operational mechanisms.