Site Reliability Engineer

NTT DATAAustin, TX

About The Position

As a Site Reliability Engineer (SRE), you will be responsible for ensuring the reliability, scalability, and performance of mission-critical cloud platforms and applications across multi-cloud environments including Azure, AWS, Google Cloud Platform (GCP), and Oracle Cloud Infrastructure (OCI). You will bridge the gap between development and operations by implementing automation, monitoring, and reliability engineering practices. This role emphasizes proactive system reliability, incident response, and continuous improvement through observability, infrastructure-as-code, and DevSecOps methodologies. You will work closely with cloud architects, DevOps teams, and security engineers to deliver highly resilient and efficient cloud services at scale.

Requirements

  • 5+ years of experience in Site Reliability Engineering, DevOps, or cloud operations roles.
  • Hands-on experience with at least one major public cloud provider (Azure, AWS, GCP, or OCI); multi-cloud experience preferred.
  • Infrastructure-as-Code using Terraform
  • Source control and CI/CD pipelines (GitHub)
  • Monitoring and observability tools (Prometheus, Grafana)
  • Incident management and production support
  • Experience with scripting or programming (Python, Bash, or similar).
  • Strong understanding of system architecture, networking, and distributed systems.
  • Experience with REST APIs and automation frameworks.

Nice To Haves

  • Experience integrating ServiceNow with monitoring and incident workflows.
  • Familiarity with CyberArk and Appgate for secure access and credential management.
  • Experience with containerization and orchestration (Docker, Kubernetes).
  • Knowledge of MuleSoft or API integration platforms.
  • Understanding of FinOps and cloud cost optimization strategies.
  • Relevant certifications (e.g., AWS Certified DevOps Engineer, Azure DevOps Engineer, Google Professional Cloud DevOps Engineer).

Responsibilities

  • Reliability Engineering & System Availability
  • Automation & Infrastructure as Code (IaC)
  • Observability & Monitoring
  • Incident Management & Response
  • Security, Compliance & Best Practices
  • Continuous Improvement & Collaboration

Benefits

  • medical
  • dental
  • vision insurance with an employer contribution
  • flexible spending or health savings account
  • life and AD&D insurance
  • short and long term disability coverage
  • paid time off
  • employee assistance
  • participation in a 401k program with company match
  • additional voluntary or legally-required benefits

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service