Cloud Infrastructure Site Reliability Engineer (SRE) Lead I - Cloud Infrastructure Services Who We Are: Born digital, UST transforms lives through the power of technology. We walk alongside our clients and partners, embedding innovation and agility into everything they do. We help them create transformative experiences and human-centered solutions for a better world. UST is a mission-driven group of 29,000+ practical problem solvers and creative thinkers in more than 30 countries. Our entrepreneurial teams are empowered to innovate, act nimbly, and create a lasting and sustainable impact for our clients, their customers, and the communities in which we live. With us, you'll create a boundless impact that transforms your career—and the lives of people across the world. You Are: As a Cloud Infrastructure Site Reliability Engineer (SRE) with expertise in multiple public cloud service provider platforms, you will be responsible for operating infrastructure solutions, following the principles and practices pioneered by Google's SRE model. The opportunity: · Design, build, and maintain highly available, scalable, and secure cloud infrastructure on platforms such as AWS, GCP, or Azure. · Develop and implement automation for provisioning, monitoring, scaling, and incident response using Infrastructure-as-Code tools (e.g., Terraform, CloudFormation, Ansible). · Monitor system reliability, capacity, and performance; proactively detect and address issues before they impact users. · Respond to production incidents, participate in on-call rotations, and lead post-incident reviews to drive root cause analysis and reliability improvements. · Collaborate with software engineering and security teams to ensure new services and features are production-ready and meet reliability standards. · Build and maintain tools for deployment, monitoring, and operations; automate manual processes to reduce toil. · Document operational processes and system architectures to ensure knowledge sharing and repeatability. · Continuously evaluate and implement new technologies to improve system reliability, security, and efficiency. This position description identifies the responsibilities and tasks typically associated with the performance of the position. Other relevant essential functions may be required.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
5,001-10,000 employees