Senior Site Reliability Engineer (SRE)

CGIWestlake, TX
3dOnsite

About The Position

We are seeking a Senior Site Reliability Engineer to help build, operate, and modernize highly resilient, cloud-native platforms supporting mission-critical applications. This role sits at the intersection of reliability engineering, cloud infrastructure, DevOps, and production engineering, and is ideal for technologists who enjoy solving complex distributed systems challenges in fast-moving environments. In this role, you will partner closely with development teams, business stakeholders, and operations groups to ensure the stability, performance, and scalability of applications running across both on-prem and cloud environments. While production support is a core responsibility, the focus goes far beyond incident response—you will actively reduce operational toil, improve observability, automate everything possible, and contribute directly to modern engineering practices within development squads. This is a hands-on, highly technical role where your work directly influences platform reliability, customer experience, and the organization’s ability to modernize and scale with confidence. This position must be performed onsite in Westlake, TX.

Requirements

  • Bachelor's degree in Computer Science, Engineering, Information Technology, or related field (Master's a plus)
  • 6+ years of hybrid experience across Production Support, SRE, and Software Development
  • Hands on experience supporting and deploying highly distributed, multi-tiered systems at scale
  • 3+ years of experience with AWS (cloud development, migration, resiliency engineering)
  • 3–6+ years of development experience with Python, Node.js, or Java, with strong SDLC and automation focus
  • 3–6+ years of hands-on Kubernetes experience (deployment, troubleshooting, cluster operations)
  • Deep expertise with observability tools such as: o Prometheus, Grafana, ELK/OpenSearch, OpenTelemetry, Datadog, Splunk
  • Strong instrumentation skills for monitoring, logging, alerting, and distributed systems telemetry
  • Advanced scripting skills (Python, Shell, etc.) for automation and tooling
  • Solid understanding of DevOps, CI/CD pipelines, and tools such as: o Jenkins, JenkinsCore, Artifactory, uDeploy, SonarQube
  • Hands on Linux experience (permissions, file systems, performance tuning)
  • Working knowledge of SQL and databases such as Oracle, MySQL, PostgreSQL, or DynamoDB
  • Experience with ETL and data processing tools (Control M, Informatica)
  • Familiarity with ITSM processes (Incident, Change, Problem Management)
  • Strong communication, collaboration, and relationship building skills
  • Ability to work independently, manage priorities, and operate in fast moving environments

Nice To Haves

  • AWS or Kubernetes certifications are a plus

Responsibilities

  • Infrastructure as code
  • Test automation
  • CI/CD improvements
  • Observability enhancements
  • Reducing toil and manual processes
  • Contribute directly to development squads and complete agile workflows
  • Share knowledge while learning from senior engineers and cross functional partners
  • Build strong relationships across engineering, business, and vendor teams
  • Represent team initiatives in user groups, technical forums, and leadership discussions
  • Drive cloud centric development, modernization, and research initiatives
  • Operate effectively in unstructured environments and resolve high impact incidents quickly
  • Think creatively to design secure, innovative solutions beyond traditional patterns

Benefits

  • Competitive compensation
  • Comprehensive insurance options
  • Matching contributions through the 401(k) plan and the share purchase plan
  • Paid time off for vacation, holidays, and sick time
  • Paid parental leave
  • Learning opportunities and tuition assistance
  • Wellness and Well-being programs
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service