Operations Engineer (Cloud)

Stifel•Memphis, TN

55d

About The Position

The Operations Engineer role is focused on Amazon Web Services (AWS) disaster recovery, automation/IaC, and FinOps. This role partners with engineering, infrastructure, and security teams to improve resilience, standardize recovery processes, and increase cloud cost visibility and optimization. The Operations Engineer will develop scalable IaC modules and automated workflows that make recovery predictable, observable, and easier to operate across multiple applications and environments.

Requirements

Demonstrated experience implementing and supporting AWS cloud platform and resiliency solutions, including disaster recovery planning, testing, and operational support.
Proficiency with IT systems, including cloud services, networking, data storage, and compute; experience with backup/replication and cloud resiliency methods is desired.
Proficiency with automation and infrastructure-as-code tools (Terraform preferred) and scripting (Python, PowerShell, and/or Bash) to create repeatable operational workflows.
Strong understanding of disaster recovery concepts, including RTO/RPO definition, runbook development, recovery validation, and continuous improvement through testing and remediation.
Familiarity with industry standards and guidance such as ISO 22301, FFIEC guidelines, NIST SP 800-34, and ITIL, and the ability to apply those expectations in a cloud context.
Business understanding and the ability to apply technology solutions to business resiliency, risk reduction, and service continuity objectives.
Analytical skills and the ability to interpret cloud usage/cost data into actionable insights; familiarity with FinOps practices and tools such as Apptio Cloudability or AWS Cost Management is desired.
Strong written and verbal communication skills, including the ability to produce clear technical documentation and present to influential audiences.
Ability to work calmly under pressure and with tight deadlines, including during DR tests, incidents, and recovery activities.
Minimum Required: Bachelor’s degree in Computer Science, Information Systems, or related field or equivalent relevant experience.
Minimum Required: 5+ years of experience in cloud/platform engineering, DevOps/SRE, or systems engineering.
Demonstrated experience implementing and testing DR strategies in AWS.
Experience building and maintaining IaC and automation via scripting.
Minimum Required: At least one AWS certification (e.g., AWS Certified Solutions Architect – Associate/Professional, AWS Certified SysOps Administrator, or AWS Certified DevOps Engineer – Professional).

Nice To Haves

Preferred: DRS & Terraform.
Preferred: FinOps Certified Practitioner.
Preferred: ITIL Foundation or similar operations/process certification.

Responsibilities

Configure and manage AWS Elastic Disaster Recovery (DR) and support DR services/patterns.
Create and maintain Cutover runbooks for DR events (dependencies, communications, validations, rollback).
Plan and execute DR tests/failovers/failbacks; capture results and drive remediation.
Build and maintain Terraform infrastructure-as-code modules and standards.
Automate provisioning and repeatable operational workflows (scripting and tooling).
Improve DR readiness through monitoring/alerting for replication health, backups, and recovery workflows.
Coordinate with security/risk to ensure DR solutions meet control and compliance expectations.
Support FinOps practices including, tagging, allocation, reporting, and optimization enablement.
Use Apptio Cloudability (preferred) to analyze spend, trends, and optimization of opportunities.