As a Platform Engineer Lead - Disaster Recovery and Resiliency , you will be responsible for the operational side of disaster recovery and resilience. As a platform engineering lead, you are going to be developing, implementing, and maintaining resiliency framework and capabilities, that applications teams can consume via automated product offerings or repeatable patterns to attest to validity and viability of their disaster recovery plans in line with business outcomes. You will partner with infrastructure and application teams to design and implement scripts, templates, and workflows that automate their product’s disaster recovery. This includes automation for all relevant resiliency elements including disaster recovery provisioning and scaling, configuration management, monitoring and observability, resyncing and reconciliation, and testing. You will work and partner closely with the project managers, technical leads, and business stakeholders to identify testing scenarios for potential threats, assess impacts, and design testing solutions to ensure business continuity and minimize risks. You will perform detailed evaluations of platform and application resiliency readiness to identify areas of concern. You will conduct regular testing, monitoring, and reporting of the resiliency and disaster recovery plans and activities. You will develop the capability to capture the book of record for all disaster recovery related data. You will identify gaps and continuous improvement opportunities. You can design and implement data collecting scripts, implement and maintain monitoring tools, and develop front-end dashboards to monitor the health, performance, and utilization of Capital’s recovery environment to enable prompt response when signs dictate. You will support Global Risk and their requirements to report to regulators on our disaster recovery effort.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level