Senior Site Reliability Engineer II

RemitlyBoca Raton, FL
1dHybrid

About The Position

About the Business: LexisNexis Risk Solutions is the essential partner in the assessment of risk. Within our Business Services vertical, we offer a multitude of solutions focused on helping businesses of all sizes drive higher revenue growth, maximize operational efficiencies, and improve customer experience. Our solutions help our customers solve difficult problems in the areas of Anti-Money Laundering/Counter Terrorist Financing, Identity Authentication & Verification, Fraud and Credit Risk mitigation and Customer Data Management. You can learn more about LexisNexis Risk at the link below, https://risk.lexisnexis.com About the Role: We are hiring a hands-on Senior Site Reliability Engineer (SRE) to actively build, operate, and improve the reliability of our production systems. This is not a purely advisory role you will be directly involved in designing infrastructure, writing Terraform, improving observability, and responding to real production incidents. If you live near one of our offices, you may work a hybrid schedule. If not, this role is fully remote. We do not restrict applicants based on job site or posting location. Job Title: Senior Site Reliability Engineer (SRE) Location: Open (U.S.-based). No job site restrictions. Work Model: Hybrid (if near an office) or Fully Remote Employment Type: Full-time Department: Engineering / Infrastructure

Requirements

  • 5+ years of hands-on experience in SRE, DevOps, or Infrastructure Engineering roles
  • Strong production experience in AWS
  • Required: Significant hands-on experience with Terraform in real-world environments
  • Experience operating monitoring and uptime platforms such as Grafana, Pingdom, and Uptrends
  • Strong Linux systems, networking, and troubleshooting skills
  • Experience supporting production systems through incident response and on-call rotations
  • Proficiency with GitHub and modern Git workflows
  • Experience building or maintaining CI/CD pipelines with Azure DevOps
  • Familiarity with ITSM and incident workflows using ServiceNow
  • Strong written communication skills with experience documenting systems and processes in Confluence
  • Ability to work independently in a remote or hybrid environment

Nice To Haves

  • Experience defining and operating against SLOs and error budgets
  • Infrastructure-as-Code best practices beyond Terraform (modules, testing, CI integration)
  • Experience with containers and orchestration (Docker, Kubernetes)
  • Experience supporting large-scale, high-availability production systems
  • Prior experience mentoring engineers or serving as a technical lead

Responsibilities

  • Design, build, and operate highly available, scalable systems in AWS
  • Write, maintain, and review Terraform to provision and manage infrastructure
  • Own and improve monitoring, alerting, and observability using Grafana, Pingdom, and Uptrends
  • Participate in a rotating on-call schedule, responding to production incidents and driving issues to resolution
  • Lead incident response, root cause analysis, and post-incident reviews with a focus on prevention and automation
  • Define and manage SLOs, SLIs, and error budgets
  • Build and improve CI/CD pipelines and operational workflows using Azure DevOps and GitHub
  • Work directly with application teams to improve reliability, performance, and deployability
  • Automate manual operational tasks to reduce toil
  • Maintain clear, actionable runbooks and documentation in Confluence
  • Track work, incidents, and operational improvements using Jira and ServiceNow
  • Mentor other engineers and help set SRE standards and best practices

Benefits

  • Competitive salary and comprehensive benefits
  • Flexible work location with hybrid or fully remote options
  • Real ownership of production systems and reliability outcomes
  • A culture that values automation, learning, and continuous improvement
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service