Senior Site Reliability Engineer II

RemitlyBoca Raton, FL
Hybrid

About The Position

LexisNexis Risk Solutions is seeking a hands-on Senior Site Reliability Engineer (SRE) to actively build, operate, and improve the reliability of their production systems. This role involves designing infrastructure, writing Terraform, enhancing observability, and responding to production incidents. The position can be fully remote for those not near an office, or hybrid for those who are. The company emphasizes that applicants are not restricted by job site or posting location.

Requirements

  • 5+ years of hands-on experience in SRE, DevOps, or Infrastructure Engineering roles
  • Strong production experience in AWS
  • Significant hands-on experience with Terraform in real-world environments
  • Experience operating monitoring and uptime platforms such as Grafana, Pingdom, and Uptrends
  • Strong Linux systems, networking, and troubleshooting skills
  • Experience supporting production systems through incident response and on-call rotations
  • Proficiency with GitHub and modern Git workflows
  • Experience building or maintaining CI/CD pipelines with Azure DevOps
  • Familiarity with ITSM and incident workflows using ServiceNow
  • Strong written communication skills with experience documenting systems and processes in Confluence
  • Ability to work independently in a remote or hybrid environment

Nice To Haves

  • Experience defining and operating against SLOs and error budgets
  • Infrastructure-as-Code best practices beyond Terraform (modules, testing, CI integration)
  • Experience with containers and orchestration (Docker, Kubernetes)
  • Experience supporting large-scale, high-availability production systems
  • Prior experience mentoring engineers or serving as a technical lead

Responsibilities

  • Design, build, and operate highly available, scalable systems in AWS
  • Write, maintain, and review Terraform to provision and manage infrastructure
  • Own and improve monitoring, alerting, and observability using Grafana, Pingdom, and Uptrends
  • Participate in a rotating on-call schedule, responding to production incidents and driving issues to resolution
  • Lead incident response, root cause analysis, and post-incident reviews with a focus on prevention and automation
  • Define and manage SLOs, SLIs, and error budgets
  • Build and improve CI/CD pipelines and operational workflows using Azure DevOps and GitHub
  • Work directly with application teams to improve reliability, performance, and deployability
  • Automate manual operational tasks to reduce toil
  • Maintain clear, actionable runbooks and documentation in Confluence
  • Track work, incidents, and operational improvements using Jira and ServiceNow
  • Mentor other engineers and help set SRE standards and best practices

Benefits

  • Competitive salary and comprehensive benefits
  • Flexible work location with hybrid or fully remote options
  • Real ownership of production systems and reliability outcomes
  • A culture that values automation, learning, and continuous improvement
  • Eligible for an annual incentive bonus
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service