Senior Site Reliability Engineer (Remote)
Stord
·
Posted:
August 17, 2023
·
Remote
About the position
We are seeking a talented Senior Site Reliability Engineer with 7+ years of experience to join our team at Stord. As a Senior SRE, you will play a crucial role in improving and scaling our infrastructure, collaborating with the engineering team to enhance the reliability and productivity of our systems. This position will focus on the Order Management System (OMS) product within our suite of logistics tools, empowering brands with automation, routing, and order management capabilities. You will have the opportunity to impact all aspects of the development process, from ideation to maintenance and operations.
Responsibilities
- Collaborate with engineering teams to understand pain points and create full-stack applications and tools for a consistent self-service experience in building, deploying, and operating code in production.
- Identify and prioritize manual tasks for automation or self-service, writing code to embed operational knowledge.
- Build, secure, maintain, and monitor infrastructure for the production application and business analytics system using infrastructure as code practices.
- Support a services-based architecture using Docker and Kubernetes, applying the 12 Factor App methodology.
- Minimize reliability failures by implementing SLI/SLO practices and addressing availability risk through backup and recovery plans.
- Respond to alerts and troubleshoot production issues, participating in incident retrospectives to improve reliability.
- Incorporate information security commitments and requirements into IT processes and enforce IS policies and procedures.
Requirements
- 7+ years of experience as a Senior Site Reliability Engineer
- Strong expertise in building and operating in Cloud Native environments on GCP, AWS, or Azure
- Proficiency in building fully automated deployment pipelines to production environments
- Familiarity with infrastructure related tools such as Docker, Kubernetes, Helm, and Terraform
- Experience with observability frameworks and platforms like OpenTelemetry, Datadog, and Stackdriver
- Ability to collaborate across engineering teams and understand pain points
- Skilled in writing code to automate manual tasks and embed operational knowledge
- Knowledge of building, securing, maintaining, and monitoring infrastructure
- Understanding of services based architecture using Docker and Kubernetes
- Familiarity with the 12 Factor App methodology
- Ability to minimize risk of reliability failures and address availability risk
- Experience in responding to alerts and troubleshooting production issues
- Strong commitment to information security and enforcing IS policies and procedures
Benefits
- Competitive salary and bonus
- Friendly, Passionate, and Intelligent Employee Base
- Creative Problem Solving and Entrepreneurial Thinking
- Fast-Paced Environment
- Low-Ego, Solution-Driven Culture
- Community Involvement and Volunteer Opportunities
- Employee Resource Groups: Women of Stord, JEDI (Justice, Equity, Diversity, & Inclusion), Stord-Serves, & More
- 401(k)
- Medical, Dental, and Vision Insurance
- Life and Disability Insurance
- Health Savings Account (HSA) option
- Employee Assistance Program (EAP) - Mental Health Resources
- Paid Parental Leave
- Gym Stipend
- Paid Time Off
- Paid holidays
- And more!