MAPFRE-posted 4 months ago
Webster, MA
5,001-10,000 employees

The Site Reliability Engineer (SRE) is a critical part of our MAPFRE USA On-Prem and Cloud platform strategy. In this role, you will be focused on ensuring MUSA’s development platform and processes enable our software engineers to focus more on innovation than infrastructure. This role will drive the adoption of observability best practices and develop automations for resolving recurring issues. You must be comfortable working with software engineering teams and supporting their demanding needs to ensure the security, availability, and performance of the platform. This engineer must be capable of triaging issues on the front line as well as framing strategic initiatives from leadership. Being hands-on keyboard is a must for this role with a focus on developing reliability engineering for MUSA Platforms.

  • Set standards for the monitoring of MUSA on-prem and Cloud infrastructure and applications.
  • Ensure the platform target SLAs are met and implement appropriate SLIs for supporting services.
  • As a key member of the Critical Incident Response team, use expert communication and troubleshooting skills to aid the team in an efficient resolution.
  • Work with developers during service transition, evaluating reliability and operability of the applications and ensuring adequate monitoring, alerting and observability.
  • Partner with peers within Operations & Infrastructure supporting ongoing maintenance and enhancement of the platform.
  • Focus on setting standards for automating routine tasks and workflows in support Infrastructure and Engineering teams.
  • Support multiple internal stakeholders with a variety of technical challenges.
  • Analyze and discern patterns in the variety of issues that arise and propose solutions to these problems.
  • Work in a 24/7/365 operation model and be available for shift or on-call support, including weekends.
  • 5 or more years of work experience with a bachelor’s degree or 4 or more years of relevant experience with an advanced degree.
  • Master’s Degree in IT, CS or related field preferred and/or 5+ years relevant work experience.
  • Hands-on experience in Linux and Windows systems and good understanding of distributed computing environments.
  • Intermediate level programming and/or scripting in 3 or more of the following: Python, PowerShell, JavaScript, Terraform, Ansible, etc.
  • 2+ years of experience managing CI/CD tooling such as Jenkins, Github, Bitbucket, DevOps in a large-scale environment.
  • 3+ Years experience managing observability tooling such as Splunk, Dynatrace, etc. in a large-scale environment.
  • Advanced understanding of YAML, JSON, HTML, XML.
  • 2+ years of work experience supporting relational and non-relational databases (MySQL, MongoDB, PostgreSQL, etc.), including creating and running queries, managing performance and scaling.
  • 3 or more years leading a Platform, SRE or Production Engineering group for high availability/critical platforms/applications.
  • Experience managing a distributed platform including but not limited to deployment/release management, provisioning, capacity management, workload management.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service