Sr. Site Reliability Engineer
MobileCoin
·
Posted:
March 27, 2023
·
Remote
About the position
The Site Reliability Engineer will join MobileCoin's infrastructure team with a focus on system performance, reliability, and observability. This is a unique opportunity for a seasoned engineer and technologist to have a large impact in a senior and brilliant team at an early stage of development. It is an opportunity to hone and develop your skills in DevOps and software engineering in a system that is refreshingly and challengingly different from standard multi-tier web-based microservice systems. The responsibilities include maintaining, monitoring, and improving Kubernetes clusters, assisting development teams in running, packaging, deploying, and troubleshooting applications, and identifying automation opportunities. Required skills include a minimum of 5 years of experience in cloud-based systems operations, extensive experience with Kubernetes and Docker, and experience with CI pipelines and Jenkins.
Responsibilities
- Maintain, monitor and improve Kubernetes clusters
- Assist development teams in running, packaging, deploying and troubleshooting applications
- Work with developers on streamlining deployment processes with Jenkins and other tooling
- Be responsible for maintenance and improvements to multiple internal services, such as Kubernetes, Prometheus, and Logging
- Monitor, triage and respond to alerts in a 24/7/365 environment
- Participate in design and code reviews, and ensure that the foundation for services is best in class
- Evaluate new technologies, design and implement as appropriate
- Identify automation opportunities and implement by creating custom or using off-the-shelf solutions
Requirements
- Minimum 5 years of experience working in cloud-based systems operations, Linux systems administration, SRE or DevOps engineering
- Comfortable with Linux command line
- Extensive experience with Kubernetes and Docker
- Experience with Prometheus and Grafana or other monitoring systems
- Experience with CI pipelines and Jenkins
- Good understanding of computer networking, TCP/IP, load balancing, distributed computing, web services, and fundamental protocols used by the internet (HTTP, HTTPS, DNS, etc.)
- Experience supporting production workloads and familiar with monitoring concepts and tooling
- Proficient in at least one scripting language and familiar with a few (Python, Bash, etc.)
- Security-minded and follows standard security best practices (least-privilege, common attack defenses, etc.)
- Enthusiastic about working in a small, growing team, open, empathetic, and cares about putting the best ideas forward in a collaborative and helpful manner
- Ability to work independently and deliver results without supervision
- Preferred skills include experience with Azure, Terraform, Rust and/or C/C++, and advanced CPU features in a container environment (SGX, GPU, etc.)