Lead Site Reliability Engineer - Remote

CentralSquare Technologies,
Remote

About The Position

We are seeking a highly skilled Senior Cloud / DevOps Engineer with a strong background in AWS, automation, infrastructure as code, and networking to support and modernize our cloud environments. This role is hands-on and will partner closely with Cloud Operations, SREs, Networking, and Application teams to improve scalability, reliability, security, and operational efficiency across mission‑critical systems. The ideal candidate is comfortable operating at both the infrastructure and application layers, has strong troubleshooting skills, and can automate repeatable operational tasks while supporting high‑availability production workloads.

Requirements

  • Strong background in AWS
  • Strong background in automation
  • Strong background in infrastructure as code
  • Strong background in networking
  • Comfortable operating at both the infrastructure and application layers
  • Strong troubleshooting skills
  • Ability to automate repeatable operational tasks
  • Experience supporting high-availability production workloads
  • Experience with Terraform, CloudFormation, or equivalent
  • Experience with CI/CD pipelines
  • Experience with Python, Bash, PowerShell, or similar scripting languages
  • Experience with cloud networking (VPCs, subnets, routing, VPNs, security groups, NACLs)
  • Experience with AWS Well-Architected Framework

Nice To Haves

  • Reduced manual operational work through automation
  • Improved deployment reliability and production stability
  • Faster recovery and clearer root cause analysis during incidents
  • Strong partnership with CloudOps, Networking, and Application teams

Responsibilities

  • Design, build, and maintain AWS-based infrastructure supporting production and non-production environments
  • Implement and maintain Infrastructure as Code (IaC) using tools such as Terraform, CloudFormation, or equivalent
  • Develop and support CI/CD pipelines for infrastructure and application deployments
  • Partner with application teams to improve deployment reliability and performance
  • Create and maintain automation scripts and tooling (Python, Bash, PowerShell, etc.) to reduce manual operations
  • Improve system reliability through self-healing mechanisms, monitoring, and alerting
  • Support SRE-style practices including incident response, root cause analysis, and continuous improvement
  • Design and support cloud networking (VPCs, subnets, routing, VPNs, security groups, NACLs)
  • Troubleshoot complex network, connectivity, and performance issues across hybrid environments
  • Implement security best practices aligned with AWS Well-Architected Framework
  • Participate in on-call rotations supporting critical production systems
  • Provide operational support, troubleshooting, and resolution for cloud-related incidents
  • Collaborate across CloudOps, Networking, DBAs, and Application teams
  • Document architectures, runbooks, and operational procedures

Benefits

  • Tuition reimbursement
  • Parental leave
  • Paid volunteer hours
  • Unlimited PTO
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service