Sr. Reliability Engineer

SPS CommerceRogers, AR
8dHybrid

About The Position

SPS Commerce is a leading provider of cloud-based supply chain management solutions, serving a global network of retail trading partners. We foster a collaborative and inclusive work environment where innovation and continuous improvement are highly valued. Join SPS Commerce and be part of a dynamic team that's transforming the global retail supply chain! Position Summary: The Sr. Site Reliability Engineer approaches Operations as a software problem and aims to apply software engineering approaches to those problems. The SPS SRE team is responsible for delivering highly available platform services and deployment automation that empower our product engineering teams with services that are secure, reliable, cost effective, and foster speed. Working within a fast-paced and collaborative environment, the SRE team partners with development teams to deliver market-leading products and services. Additionally, using automation and other technologies to intelligently cope with challenging failures while collaborating with various engineering organizations to resolve failure risks at the source.

Requirements

  • 5+ years IT experience with a Bachelor's degree; or 3 years and a Master's degree; or a PhD with 1 year; or equivalent work experience
  • Experience in Python and/or Typescript with software Engineering mindset
  • Experience working with AWS including RDS
  • Experience or interest in platform and service mesh technologies such as Docker, Kubernetes
  • Experience building or operating CI/CD pipelines or other deployment automation solutions
  • Experience administering Linux
  • Experience participating in Agile development methodology and task execution
  • Experience with immutable and scalable infrastructure (infrastructure as code concepts)
  • Demonstrated understanding of networking systems, various identity and authorization systems
  • Consistently demonstrated superior problem solving and collaboration skills

Nice To Haves

  • Experience with advanced monitoring solutions such as metrics platforms, logging, distributed tracing, etc…
  • Experience working with databricks or Snowflake

Responsibilities

  • Engineer and maintain highly available, secure, and cost-effective container orchestration platforms such as Kubernetes and ECS
  • Engineer Continuous Integration & Continuous Delivery (CI/CD) solutions that simplify and improve software deployments to enable high velocity for our Product Engineering partners
  • Develop robust monitoring and observability services and patterns to consistently improve the team's ability to identify, react, respond, and recover from complex failures
  • Collaborate with Technology Engineering, Development, and Product Management to help develop, scale, and improve production systems and services
  • Partner with service teams to provide appropriate documentation, cross-training, architecture planning, capacity management, and recommendations for future state
  • Engineer technical solutions to prevent or reduce the frequency of failures
  • Help drive the code quality practices within the team and work hard to deliver a maintainable software
  • Participate in screening, interview panels, and other hiring related activities when required
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service