Sr. Staff Cloud Platform Engineer - Security

UKG•Atlanta, GA

12h•$129,500 - $186,100

About The Position

At UKG, our purpose is people. We are seeking a highly technical Cloud Resilience Analyst to join our Resilience team within the Cyber Security organization. This is not a traditional Business Continuity Planning (BCP) or Disaster Recovery (DR) compliance role. We are looking for a true cloud practitioner—someone who has historically been hands-on in the trenches building highly available architectures and is now ready to step into a strategic advisory role. In this position, you will act as a primary consultant to multiple product engineering teams and enterprise groups across UKG. You will guide them in designing, implementing, and validating fast-failover, highly redundant solutions for our SaaS applications. If you have a passion for eliminating single points of failure and designing self-healing infrastructure on modern cloud platforms, this role is for you.

Requirements

Deep, practical technical knowledge of Google Cloud Platform (GCP) core services, specifically GKE, Compute Engine, and CloudSQL.
Familiarity with AWS and Azure is highly desirable.
Proven past experience as a hands-on engineer who has deployed complex infrastructure. You should understand the implementation details well enough to effectively guide engineering teams.
Demonstrated success in architecting active-active or active-passive fast failover mechanisms for high-volume, data-intensive SaaS applications.
Strong understanding of database clustering, replication, and migration strategies (especially migrating legacy RDBMS like MS SQL Server to cloud-native solutions like CloudSQL).
Excellent communication and consulting skills, with the ability to influence technical teams, explain complex architectural concepts, and foster a culture of resilience without having direct reporting authority over the engineering teams.

Nice To Haves

Practical design experience managing high-availability network topologies, including load balancing, DNS & name resolution, firewalls/gateways, identity/authentication systems, and centralized logging/SIEM.
Deep understanding of user session replication, session state persistence, and failover routing strategies in high-traffic, multi-region application architectures.
Familiarity assessing or designing resilience across a comprehensive range of critical SaaS failure domains, such as API gateways, caching layers, messaging/queuing systems, and CI/CD pipelines.

Responsibilities

Act as the primary resilience advisor to multiple distributed product and enterprise teams, guiding them on best practices for building high availability (HA) and redundancy into their SaaS applications.
Design and recommend fast-failover solutions and highly available infrastructure primarily on Google Cloud Platform (GCP), while also providing oversight for workloads in Azure and AWS.
Leverage your strong background in Infrastructure as Code (IaC) to review, validate, and guide the implementation efforts of engineering teams.
Design redundancy strategies for workloads running on Google Kubernetes Engine (GKE) and virtual machines, ensuring self-healing deployments.
Partner closely with DevOps, SRE, and Product Engineering teams to champion resilience engineering principles, chaos testing, and failover validations across tier-0 mission-critical systems.