Site Reliability Engineer

Apply

MoonPay

Posted:

August 29, 2023

Remote

Job Commitment

Full-time

Experience Level

Mid Level

Workplace Type

Remote

Job Function

Dev & Engineering

This job is closed

We regret to inform you that the job you were interested in has now been closed. Although this specific position is no longer available, we encourage you to continue exploring other opportunities on our job board.

About the position

MoonPay is seeking a Site Reliability Engineer to join their team. The SRE will be responsible for providing a resilient and secure platform for deploying applications and services. They will work on improving infrastructure, building monitoring mechanisms, load testing, and maintaining Kubernetes clusters. In the long term, the SRE will implement new technologies, automate processes, track metrics, and collaborate with other engineering functions. Strong systems administration skills and experience in platform engineering/SRE are required for this role.

Responsibilities

Provide a resilient, secure, production-ready platform for deploying applications and services in a self-serve, repeatable manner.
Support product delivery and operational teams by surfacing data from the production environment and driving meaningful change based on insights.
Improve the maintainability of infrastructure as code.
Build dashboards, monitoring, and alerting mechanisms using Datadog.
Conduct load testing and performance tuning of production services.
Lifecycling and maintenance of Kubernetes clusters.
Implement new technologies on top of Kubernetes to ensure scalability.
Develop and integrate automation solutions to improve reliability and facilitate recovery.
Design and track metrics for site uptime and performance.
Own deployment pipelines and continuously improve monitoring and alerting capabilities.
Collaborate with other engineering functions to provide timely feedback.
Support Engineering in delivering better software, faster, and more safely.

Requirements

Strong systems administration skills
Knowledge of the difference between a container and a virtual machine
Familiarity with Linux terminal
Platform engineering/SRE experience at leading startups or fast-growing tech companies

Benefits

Resilient and secure production-ready platform
Self-serve and repeatable deployment of applications and services
Surfacing data from production environment and driving meaningful change
Opportunity to work with a leading web3 infrastructure company
End-to-end solutions for payments, smart contract development, and digital asset management
Opportunity to work with iconic brands
Increase resiliency and reliability of PaaS solution
Building dashboards, monitoring, and alerting mechanisms
Load testing and performance tuning of production services
Lifecycling and maintenance of Kubernetes clusters
Implementing new technologies on top of Kubernetes
Automation solutions to improve and maintain reliability
Design and track metrics for site uptime and performance
Ownership of deployment pipelines and continuous improvement of monitoring and alerting capabilities
Collaboration with other engineering functions
Support in delivering better software, faster and more safely
Strong systems administration skills
Platform engineering/SRE experience
Cross-training and upskilling opportunities
Experience in regulated industry
Experience in monitoring and logging of complex systems at scale
Collaboration with different teams
Opportunity to forge and own reliability and recovery processes
Understanding of complex reliability structures, theories, principles, and best practices
Experience with JavaScript codebases and frameworks (e.g., Typescript, Node.js, React)
Emphasis on culture add and diversity experience
Interview process with multiple stages
Accommodations available for interview process.