Senior Site Reliability Engineer

Airlock DigitalSan Francisco, CA
8d$148,000 - $185,000Remote

About The Position

Airlock Digital is a global leader in application control and allowlisting. We seek to empower every organization to run only what they trust and operate free from malware and ransomware. With rapid growth across Australia, North America, and EMEA. We are committed to our core values, respect, determination, and integrity. We support a diverse and expanding global customer base. At Airlock, we pride ourselves on being a team of humble, collaborative, and driven professionals who support one another and share a passion for cybersecurity. The Senior Site Reliability Engineer (SSRE) is responsible for ensuring the reliability, scalability, performance and efficiency of our systems, applications and services. Working closely with cross-functional teams such as development, operations, and infrastructure to proactively identify, troubleshoot and resolve issues to ensure optimal performance and uptime.

Requirements

  • 5+ years of hands-on experience in Site Reliability and Observability Engineering, DevOps or Infrastructure Engineering, debugging, diagnosing and resolving high-severity incidents.
  • Commercial experience in in at least one programming language such as Python, or Go.
  • Solid experience with automation tools such as Ansible and containerization tools like Docker and Podman.
  • Deep understanding of distributed systems, networking, operating systems, and cloud computing.
  • Strong troubleshooting and problem-solving skills, and experience in incident response, root cause analysis, and post-mortem activities.
  • Systematic problem-solving approach, coupled with effective communication skills and a sense of ownership and drive.

Responsibilities

  • Design, implement, and maintain highly available, scalable, and fault-tolerant systems and services.
  • Introduce best practices into Airlock Digital around observability, SLO’s and reliability.
  • Continuously monitor the performance, availability and security of Airlock Digital systems and services and proactively identify and resolve issues.
  • Identify areas for improvement across the organization and drive engineering-wide technical change in the field of site reliability.
  • Collaborate with cross-functional teams to implement and maintain deployment pipelines, monitoring tools and automated testing frameworks.
  • Develop and maintain document of systems, processes and procedures to ensure knowledge transfer and continuity.
  • Lead incident response, root cause analysis and post-mortem activities to identify and address underlying issues.
  • Work with Software Developers to design and implement scalable and resilient applications services and infrastructure.
  • Participate in on-call rotation to ensure 24/7 support for critical systems and services.

Benefits

  • Medical, dental, and vision insurance
  • 401K Plan with 4% Company Match
  • Life and Disability Programs
  • Paid Parental Leave
  • Paid time off and Paid Holidays
  • Volunteer and Birthday Time off
  • Home Office Allowance
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service