Senior Cloud Operations Engineer

Enterprise AI decisioning and workflow automation platformColumbia, DC
Remote

About The Position

As a member of the Service Reliability Team (SRT), you will play a critical role in ensuring the reliability, availability, performance, and security of Pega’s cloud infrastructure platforms. We operate a global, follow‑the‑sun 24x7 model with teams across Krakow, Bangalore, Sydney, and the U.S. East Coast. SRT values diversity, collaboration, intellectual curiosity, and a strong ownership mindset. We foster an environment that emphasizes mentorship, continuous learning, and operational excellence at scale. In this role, you will work hands‑on with cloud infrastructure and platform services that underpin Pega’s SaaS offerings. You will take ownership of the systems you operate—digging deep to identify root causes, strengthening resiliency, and preventing recurrence through automation and operational rigor. You’ll collaborate closely with Engineering, Cloud Engineering, Security, and Customer Support to deliver best‑in‑class infrastructure reliability for mission‑critical customer environments.

Requirements

  • 3+ years supporting enterprise cloud infrastructure or cloud operations for SaaS platforms
  • 2+ years of hands‑on experience operating AWS infrastructure services (experience with GCP a plus)
  • 2+ years of Linux systems administration experience
  • Working knowledge of AWS services, including but not limited to: EC2, EBS, S3 ELB / Load Balancing VPC, Transit Gateway, Route 53
  • Experience supporting highly available, fault‑tolerant cloud environments
  • Familiarity with infrastructure automation and scripting (Bash, Shell, Python, or similar)
  • Exposure to networking concepts (DNS, load balancing, firewalls) as part of broader infrastructure operations
  • US Citizenship is required due to the nature of work with FedRamp.
  • An experienced cloud operations or infrastructure engineer with a strong platform and systems mindset
  • Self‑directed, analytical, and driven by continuous improvement and operational excellence
  • Comfortable collaborating with cross‑functional, globally distributed teams
  • Able to learn and adopt new tools, technologies, and operational patterns quickly
  • Effective in fast‑paced, enterprise‑scale environments
  • Customer‑focused, with empathy and accountability for production systems

Nice To Haves

  • AWS or GCP certification preferred
  • CCNA/CCNP a plus, but not required

Responsibilities

  • Monitor, respond to, and resolve infrastructure alerts, incidents, service requests, and changes within SLA
  • Own and drive customer-impacting escalations with a focus on stability and service restoration
  • Provision, operate, and upgrade cloud infrastructure components across compute, storage, and platform services
  • Troubleshoot complex infrastructure and platform issues, perform root cause analysis, and contribute to long‑term fixes
  • Create, maintain, and continuously improve runbooks, SOPs, and operational standards
  • Partner with Engineering on pre‑release validation and operational readiness of new platform capabilities
  • Identify opportunities to automate manual or repetitive operational tasks and reduce operational toil
  • Participate in infrastructure‑focused projects and adapt to evolving business and platform requirements
  • Participate in an after‑hours on‑call rotation, including weekend coverage
  • Support FedRAMP‑compliant environments (U.S. citizenship and residency required)

Benefits

  • Continuous learning, certification, and career development opportunities
  • An inclusive, flexible, and globally collaborative work environment
  • Competitive compensation including base pay, incentive plan, and equity participation
  • Base salary range for this role is 102,400 - 153,200 USD annually.
  • This role may also be eligible for annual bonus OR commission, as well as benefits and other incentives.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service