Site Reliability Engineer

NTT DATA Services•Westlake, TX

1d•Onsite

About The Position

We are currently seeking a Site Reliability Engineer (SRE) to join our team in Westlake, Texas (US-TX), United States (US). Position Overview We are seeking a highly skilled Site Reliability Engineer (SRE) with strong expertise in Terraform, Cloud Infrastructure, DevOps, Automation and Load Balancing,. The ideal candidate will be responsible for ensuring the reliability, scalability, performance, and availability of critical enterprise applications across hybrid and multi-cloud environments. This role requires a seasoned engineer who combines deep knowledge of F5/AVI load balancing technologies with modern DevOps practices, Infrastructure-as-Code (IaC), cloud platforms, CI/CD pipelines, and operational excellence.

Requirements

5+ years of experience in Site Reliability Engineering, DevOps Engineering, Platform Engineering, or related disciplines (understanding reliability engineering principles, SLIs, SLOs, error budgets, and operational excellence).
5+ years’ hands-on Terraform experience.
5+ years’ experience supporting mission-critical enterprise applications in production environments.
5+ years’ experience with cloud networking, security, and infrastructure architecture.
5+ years of hands-on experience managing hybrid cloud environments.
5 + years of automation skills using Python, Ansible, Shell scripting, or similar technologies.
5+ years’ experience building reusable infrastructure modules and automated deployment frameworks.

Nice To Haves

AWS Certifications (Solutions Architect, DevOps Engineer, SysOps Administrator).
HashiCorp Terraform Certification.
Experience in large-scale financial services or highly regulated environments.

Responsibilities

Design, implement, and support highly available load balancing solutions using F5 BIG-IP, Broadcom AVI, and cloud-native load balancing services.
Build and maintain Infrastructure-as-Code (IaC) solutions using Terraform.
Develop automation solutions for infrastructure provisioning, configuration management, and operational workflows.
Support and enhance CI/CD pipelines using tools such as Jenkins, Azure DevOps, GitHub Actions, or similar platforms.
Collaborate with application, cloud, network, and platform teams to improve reliability, performance, and scalability.
Monitor production systems and proactively identify reliability, performance, and availability risks.
Implement Site Reliability Engineering best practices including observability, incident management, capacity planning, and resiliency engineering.
Troubleshoot complex issues across networking, cloud infrastructure, load balancing, and application environments.
Support hybrid infrastructure environments spanning on-premises datacenters and public cloud platforms.
Participate in on-call rotation and provide production support for critical business applications.