Site Reliability Engineer (SRE)

Ampcus

48d•Onsite

About The Position

Ampcus Inc. is a certified global provider of a broad range of Technology and Business consulting services. We are in search of a highly motivated candidate to join our talented Team. Job Title: Site Reliability Engineer (SRE) Location(s): Fort Mill, SC Role Overview The role requires a strong Site Reliability Engineering (SRE) and DevOps professional with deep expertise in observability, automation, cloud platforms, and modern operational practices. The individual will provide oversight of production operations, drive reliability and scalability, and enable proactive, data-driven operations through automation, AIOps, and continuous improvement.

Requirements

Strong developer background with the ability to understand application-layer behavior and its interaction with infrastructure platforms.
End-to-end understanding of the software delivery lifecycle—from code management through deployment.
Quick learner with the ability to rapidly adopt new tools, scripting languages, or technologies as required.
Proactive, solution-oriented mindset with a strong focus on reliability, scalability, and operational excellence.

Responsibilities

Design, build, and maintain enterprise-grade dashboards for monitoring, observability, and operational insights.
Implement intelligent alerting systems across multiple platforms to proactively identify and mitigate issues.
Deliver full-stack observability solutions, including monitoring, logging, tracing, and event management integrations.
Provide oversight of production operations to maximize service reliability, resiliency, and automation.
Define, implement, and continuously evolve SRE practices, procedures, tooling, and runbooks.
Monitor system capacity, performance, and health trends; provide analytics, forecasting, and capacity rmendations.
Drive a proactive operational model, focusing on prevention and optimization rather than reactive incident response.
Design, develop, and roll out CI/CD frameworks across hybrid and multi-cloud environments.
Implement Infrastructure as Code (IaC) solutions using Terraform and cloud-native tooling.
Facilitate release and deployment management across multiple non-production and production environments.
Build, deploy, and manage DevOps pipelines on AWS, Azure, and GCP.
Provide day-to-day technical direction and innovation for platform services, with a strong focus on Azure.
Enable core platform capabilities including cloud connectivity, infrastructure, and d services at scale.
Implement AIOps and data-driven operational tooling and dashboards to improve decision-making and operational efficiency.
Identify opportunities to automate repetitive or manual processes; champion automation-first thinking.
Identify inefficiencies within Platform Services Operations and lead continuous improvement initiatives.
Define, document, and maintain standard operating procedures, runbooks, and architectural documentation.
Create and maintain system architecture diagrams and operational documentation using Jira, Confluence, and UML.
Translate discussions from troubleshooting, design sessions, and brainstorming meetings into clear architecture diagrams and actionable plans.
Ensure operational processes are executed with high attention to detail, speed, and on-time delivery.
Act as an out-of-the-box thinker, continuously challenging traditional processes and driving innovation through automation.