Cutover-posted 2 months ago
$120,000 - $130,000/Yr
Full-time • Mid Level
101-250 employees

Cutover provides enterprise technology operations teams with an AI-powered SaaS solution that automates and streamlines complex processes with intelligent runbooks. The Cutover solution enables teams to respond to incidents quickly, recover from IT outages, and manage cloud migrations with precision and efficiency. Cutover is used in many of the world's largest financial institutions to support their critical technology operations, including 5 out of the top 6 largest asset managers and 3 out of the top 5 US banks. We’re looking for a Site Reliability Engineer (SRE) to add to our US team. This role will report to our SRE Lead. Cutover’s SRE team is responsible for ensuring the reliability and performance levels of our production systems and applications. As a team, we’re committed to constantly improving our engineering culture to maintain a balance between risk and reliability.

  • Respond to incidents and alerts, triaging urgency and investigating root cause
  • Regular contributions to improve our documentation on system design, troubleshooting, best practices, and engineering processes
  • Contribute to post-mortems and help identify long-term improvements under guidance
  • Support cross-functional teams during investigations and post-incident reviews
  • Support and enhance observability tools and techniques by identifying metrics, logging, and alerting improvements
  • Write and execute simple automation scripts (e.g. Python, Ruby, Bash) to improve reliability and toil reduction
  • Work on internal tools, pipelines, and IaC solutions to help improve the speed of software delivery and recovery
  • Work on efforts to enhance the reliability and performance of our application and systems, ensuring optimal uptime and minimal disruptions
  • Work closely with the development and platform engineering teams to optimize the infrastructure on AWS, ensuring scalability and efficiency
  • A genuine excitement for complex problem solving within our tech stack, applying what you know to our unique problems
  • Familiarity with at least one scripting language such as Ruby, JavaScript, Python, Bash
  • Experience with containerization (i.e. Docker) or IaC (e.g. Terraform, Helm, CloudFormation)
  • An eagerness to follow modern engineering practices and learn from others
  • Familiarity with observability tools such as DataDog, New Relic, Grafana, Prometheus, ELK, or OpenTelemetry
  • Understanding of core networking concepts (DNS, HTTP/S, Load Balancing, etc.)
  • A collaborative mindset with clear communication skills
  • Willing to ask questions to gain a better understanding of new or complex concepts
  • Exposure to major incident response processes
  • AWS Certified Cloud Practitioner or hands-on experience with cloud environments
  • Share Options as part of our compensation package
  • 20 days of PTO per year + public holidays
  • 3 volunteer days to use for any charitable/voluntary cause
  • A top-tier private health insurance package
  • 401k contribution plan
  • Work from home stipend
  • A personal learning and development budget through Learnerbly
  • Globally consistent parental leave approach
  • Employee Referral Scheme
  • Multiple Cutover mental health initiatives, including fully subsidised therapy sessions
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service