Cloud Operations & Support Provision, configure, and maintain cloud infrastructure across AWS, Azure, GCP, and OCI. Monitor, troubleshoot, and resolve incidents, performance issues, and service outages in production and staging environments. Implement and maintain monitoring, alerting, and logging solutions to ensure high availability and reliability. Lead root cause analysis and post-mortem documentation for major incidents. Execute patch management, upgrades, and regular maintenance activities. Develop and maintain backup, disaster recovery, and failover strategies and operations. Participate in on-call rotation and after-hours support as required. Automation & Infrastructure Management Develop and maintain Infrastructure as Code (IaC) templates using tools such as Terraform, CloudFormation, ARM, or OCI Resource Manager. Use scripting (e.g., Python, Bash, PowerShell) to automate repetitive tasks and operational processes. Champion the use of configuration management tools and assist in DevOps pipeline integrations. Recommend and implement cost optimization, resource utilization, and rightsizing strategies. Ensure adherence to security best practices, including least-privilege access, encryption, and network segmentation. Implement and manage identity and access management (IAM) policies and roles. Monitor, identify, and remediate security vulnerabilities reported by scanning tools or external advisories. Support compliance efforts related to customer and regulatory requirements (TxRAMP, ISO, SOC2, etc.). Collaboration & Documentation Work closely with application, security, and network teams for solution delivery and support. Mentor junior engineers and provide technical guidance as needed. Create and update technical documentation, runbooks, and SOPs. Participate in client calls to provide technical input when required.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
5,001-10,000 employees