Senior Site Reliability Engineer (SRE)

CGI•Westlake, TX

3d•Onsite

About The Position

We are seeking a Senior Site Reliability Engineer to help build, operate, and modernize highly resilient, cloud-native platforms supporting mission-critical applications. This role sits at the intersection of reliability engineering, cloud infrastructure, DevOps, and production engineering, and is ideal for technologists who enjoy solving complex distributed systems challenges in fast-moving environments. In this role, you will partner closely with development teams, business stakeholders, and operations groups to ensure the stability, performance, and scalability of applications running across both on-prem and cloud environments. While production support is a core responsibility, the focus goes far beyond incident response—you will actively reduce operational toil, improve observability, automate everything possible, and contribute directly to modern engineering practices within development squads. This is a hands-on, highly technical role where your work directly influences platform reliability, customer experience, and the organization’s ability to modernize and scale with confidence. This position must be performed onsite in Westlake, TX.

Requirements

Bachelor's degree in Computer Science, Engineering, Information Technology, or related field (Master's a plus)
6+ years of hybrid experience across Production Support, SRE, and Software Development
Hands on experience supporting and deploying highly distributed, multi-tiered systems at scale
3+ years of experience with AWS (cloud development, migration, resiliency engineering)
3–6+ years of development experience with Python, Node.js, or Java, with strong SDLC and automation focus
3–6+ years of hands-on Kubernetes experience (deployment, troubleshooting, cluster operations)
Deep expertise with observability tools such as: o Prometheus, Grafana, ELK/OpenSearch, OpenTelemetry, Datadog, Splunk
Strong instrumentation skills for monitoring, logging, alerting, and distributed systems telemetry
Advanced scripting skills (Python, Shell, etc.) for automation and tooling
Solid understanding of DevOps, CI/CD pipelines, and tools such as: o Jenkins, JenkinsCore, Artifactory, uDeploy, SonarQube
Hands on Linux experience (permissions, file systems, performance tuning)
Working knowledge of SQL and databases such as Oracle, MySQL, PostgreSQL, or DynamoDB
Experience with ETL and data processing tools (Control M, Informatica)
Familiarity with ITSM processes (Incident, Change, Problem Management)
Strong communication, collaboration, and relationship building skills
Ability to work independently, manage priorities, and operate in fast moving environments

Nice To Haves

AWS or Kubernetes certifications are a plus

Responsibilities

Infrastructure as code
Test automation
CI/CD improvements
Observability enhancements
Reducing toil and manual processes
Contribute directly to development squads and complete agile workflows
Share knowledge while learning from senior engineers and cross functional partners
Build strong relationships across engineering, business, and vendor teams
Represent team initiatives in user groups, technical forums, and leadership discussions
Drive cloud centric development, modernization, and research initiatives
Operate effectively in unstructured environments and resolve high impact incidents quickly
Think creatively to design secure, innovative solutions beyond traditional patterns