Data Center Ops Resilience Engineer

Guild Mortgage
Onsite

About The Position

Guild Mortgage is seeking a highly technical and motivated Data Center Operations Engineer, Resilience to support and advance the company’s enterprise resilience and security posture. This role is responsible for the administration, support, and continuous improvement of technical capabilities related to high availability, disaster recovery, backup, recovery, cyber resilience, and operational readiness across critical infrastructure platforms. This position plays an important role in supporting Guild’s strategic direction that resilience must be built into design, implementation, and operational processes from the outset, not addressed after operational or cybersecurity risks have already been introduced.

Requirements

  • A combination of education and experience may be considered in lieu of the Bachelor’s degree.
  • Bachelors Degree directly related to the position or equivalent, preferred or equivalent computer-related degree from a technical school, or similar training.
  • Minimum five years experience supporting one or more of the following areas: high availability, disaster recovery, backup, replication, recovery, cyber security or infrastructure operations.
  • Strong analytical, troubleshooting, and problem-solving skills.
  • Strong technical aptitude with the ability to work across multiple infrastructure platforms On Prem, Hybrid and cloud operational domains.
  • Ability to collaborate effectively across technical teams and work within established operational processes.
  • Strong interpersonal and customer service skills, with the ability to build trusted relationships across technical and business teams, communicate complex concepts clearly, and respond to stakeholder needs with professionalism and urgency.
  • Demonstrates active listening, empathy, and solutions-oriented mindset to ensure a high-quality service experience and positive outcomes.
  • Possess an “Engineering Spirit”: The ability to identify legacy and inefficient practices, challenge outdated operational approaches, and drive modernization through the adoption of industry best practices and proven real-world technology solutions.
  • Possess an “Engineering Spirit”: The ability to evaluate the environment through both a security and resilience lens, identify and make gaps visible, and ensure remediation is driven through the appropriate operational, risk, and governance channels.
  • Demonstrated ability to identify technical or operational inefficiencies and contribute to sustainable improvements.
  • Knowledge of and exposure to Active Directory/Entra, enterprise Backup solutions, enterprise Replications solutions and Server Virtualization.
  • Self-starter with the demonstrated ability to learn/adapt to new technologies and techniques.
  • Ability to organize and manage multiple priorities simultaneously in a fast-paced, deadline-driven environment.
  • Passionate about delivering excellence in customer service within a team environment.
  • Ability to be patient and train less experienced team members; respond to questions, build capability.
  • Ethical, with a commitment to company values.
  • Excellent verbal and written communication skills.
  • Highly organized and detail-oriented; ability to work in a fast-paced, metrics-driven environment.
  • Proficiency in Microsoft Office Suite, Word, Excel, Wiki, collaborative cloud-based programs, and third-party software applications required.
  • Commitment to company.
  • Customer Service - Proactive attention to each person.
  • Integrity - Do and say what's right.
  • Respect - Treat others with dignity.
  • Collaboration - Listen and work together.
  • Learning - Seek knowledge and strive for improvement.
  • Excellence – Deliver the unexpected.

Nice To Haves

  • Experience with resilience-related technologies is strongly desired
  • Prior IBMi knowledge is a plus

Responsibilities

  • Support the design, administration, monitoring, and continuous improvement of enterprise resilience capabilities across data center and infrastructure environments.
  • Maintain and enhance solutions supporting high availability, disaster recovery, backup, replication, and recovery operations.
  • Identify, assess, and help remediate cybersecurity risks, resilience gaps, and operational weaknesses through the application of sound engineering practices and established standards.
  • Partner with infrastructure, security, and application teams to ensure resilience and recoverability requirements are incorporated into technical solutions and operational processes from the beginning.
  • Participate in resilience testing activities, including failover exercises, disaster recovery validation, backup recovery testing, and operational readiness reviews.
  • Support monitoring, alerting, incident response, escalation processes, and service restoration efforts related to resilience platforms and technologies.
  • Develop, maintain, and improve technical documentation, operational procedures, standards, and runbooks.
  • Contribute to process improvement initiatives that strengthen platform stability, recoverability, and overall security posture.
  • Participate in troubleshooting, root cause analysis, and corrective action planning for resilience-related incidents and service disruptions.
  • Support cross-platform resilience efforts involving compute, storage, backup, network, and platform services.
  • Perform other duties as assigned.

Benefits

  • pleasant work environment
  • competitive compensation
  • excellent benefits package
  • medical insurance
  • dental insurance
  • vision insurance
  • life insurance
  • AD&D
  • LTD
  • 401(k) with employer match
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service