Senior Cloud Operations Engineer

Enterprise AI decisioning and workflow automation platform•Columbia, DC

7h•Remote

About The Position

As a member of the Service Reliability Team (SRT), you will play a critical role in ensuring the reliability, availability, performance, and security of Pega’s cloud infrastructure platforms. We operate a global, follow‑the‑sun 24x7 model with teams across Krakow, Bangalore, Sydney, and the U.S. East Coast. SRT values diversity, collaboration, intellectual curiosity, and a strong ownership mindset. We foster an environment that emphasizes mentorship, continuous learning, and operational excellence at scale. In this role, you will work hands‑on with cloud infrastructure and platform services that underpin Pega’s SaaS offerings. You will take ownership of the systems you operate—digging deep to identify root causes, strengthening resiliency, and preventing recurrence through automation and operational rigor. You’ll collaborate closely with Engineering, Cloud Engineering, Security, and Customer Support to deliver best‑in‑class infrastructure reliability for mission‑critical customer environments.

Requirements

3+ years supporting enterprise cloud infrastructure or cloud operations for SaaS platforms
2+ years of hands‑on experience operating AWS infrastructure services (experience with GCP a plus)
2+ years of Linux systems administration experience
Working knowledge of AWS services, including but not limited to: EC2, EBS, S3 ELB / Load Balancing VPC, Transit Gateway, Route 53
Experience supporting highly available, fault‑tolerant cloud environments
Familiarity with infrastructure automation and scripting (Bash, Shell, Python, or similar)
Exposure to networking concepts (DNS, load balancing, firewalls) as part of broader infrastructure operations
US Citizenship is required due to the nature of work with FedRamp.
An experienced cloud operations or infrastructure engineer with a strong platform and systems mindset
Self‑directed, analytical, and driven by continuous improvement and operational excellence
Comfortable collaborating with cross‑functional, globally distributed teams
Able to learn and adopt new tools, technologies, and operational patterns quickly
Effective in fast‑paced, enterprise‑scale environments
Customer‑focused, with empathy and accountability for production systems

Nice To Haves

AWS or GCP certification preferred
CCNA/CCNP a plus, but not required

Responsibilities

Monitor, respond to, and resolve infrastructure alerts, incidents, service requests, and changes within SLA
Own and drive customer-impacting escalations with a focus on stability and service restoration
Provision, operate, and upgrade cloud infrastructure components across compute, storage, and platform services
Troubleshoot complex infrastructure and platform issues, perform root cause analysis, and contribute to long‑term fixes
Create, maintain, and continuously improve runbooks, SOPs, and operational standards
Partner with Engineering on pre‑release validation and operational readiness of new platform capabilities
Identify opportunities to automate manual or repetitive operational tasks and reduce operational toil
Participate in infrastructure‑focused projects and adapt to evolving business and platform requirements
Participate in an after‑hours on‑call rotation, including weekend coverage
Support FedRAMP‑compliant environments (U.S. citizenship and residency required)

Benefits

Continuous learning, certification, and career development opportunities
An inclusive, flexible, and globally collaborative work environment
Competitive compensation including base pay, incentive plan, and equity participation
Base salary range for this role is 102,400 - 153,200 USD annually.
This role may also be eligible for annual bonus OR commission, as well as benefits and other incentives.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume