Site Reliability Engineer

Fireblocks

52d•$150,000 - $185,000•Remote

About The Position

The world of digital assets is accelerating in speed, magnitude, and complexity, opening the door to new ways for leveraging the blockchain. Fireblocks’ platform and network provide the simplest and most secure way for companies to work with digital assets and it trusted by some of the largest financial institutions, banks, globally-recognized brands, and Web3 companies in the world, including BNY Mellon, BNP Paribas, ANZ Bank, Revolut, and thousands more. About the team Join a newly established, mission-critical SRE team at the forefront of financial infrastructure reliability. As part of Fireblocks Trust’s commitment to operational excellence, our Site Reliability Engineering team serves as the backbone of production systems, ensuring world-class uptime and performance for our digital asset custody and settlement platform.

Requirements

At least 3+ years of experience as SRE, Infra Backend in a SaaS environment.
You are curious, self-motivated, easy to work with, responsible and production aware. Fast learner and able to take a project from POC to production, while handling decision making and communication.
Experience with Coding languages - Python/JavaScript/Bash (Must)
At least 3+ years of experience with Alerting & Monitoring systems such as DataDog Coralogix / Splunk / New Relic / Prometheus
Experience working with Linux systems from kernel to shell and beyond
Cloud systems such as AWS / Google cloud / Azure
Configuration management such as Ansible/Chef/Puppet/ArgoCD
Experience with Docker, Kubernetes and Helm
SCM - Git/bitbucket/gitlab/Phabricator/gerrit
High Analytical & Troubleshooting skills - ability to solve complex problems
Strong verbal and written communication skills and a collaborative mindset

Nice To Haves

Previous experience in cryptocurrencies \ blockchains - big advantage
In Depth knowledge in: Linux optimization, nginx, ArgoCD, DataDog, MySql
Participated in Kubernetes migration projects
Previous experience as C++ or Node developer
BSC in Computer Science or related technical certifications

Responsibilities

Improve and establish new monitoring, alerting and observability of services using a wide range of tools.
Handle critical alerts and incidents and work directly with R&D to improve and optimize availability.
Research Fireblocks blockchain workflows, identify optimization opportunities, issues and improve monitoring.
Help Identify root causes for incidents and prevent them from happening again.
Solve and orchestrate outages by working with multiple teams.
Improve and establish alerting for our infrastructure, services and business logic
Work closely with the R&D and Support: offering education and guidance on integration, support, and monitoring across the toolset
Communicate and escalate issues to senior management in R&D and support, write RCA’s, define next steps.
Document actions in runbooks and then into automation using Python, Lamda, shell scripts, ArgoCD, Ansible.
Focus on the system's observability, availability, reliability, performance/latency, monitoring
Conduct periodic on-call duties and emergency response