DevOps Engineer II

doTERRAPleasant Grove, UT
5d

About The Position

At doTERRA we encourage all employees to seek out opportunities that will expand their skill set. We strive to help achieve personal career goals by providing opportunities of growth and movement throughout the company. Job Description: Site Reliability Engineering (SRE) applies software engineering techniques and discipline to production operations to attack major problems and fix them for good. SRE also participates in architecture planning to help insure the architecture will scale and have low to zero maintenance / technical debt. SRE will assign tasks to developers regarding technical debt items and help prioritize those tasks by working with their development leads. SRE is on call to keep software services available and operating fast. SRE provides performance reports, tuning recommendations, and code evaluations for systems performance improvements.

Requirements

  • Strong ability to problem-solve performance related issues ahead of time, help provide feedback for architecture designs from a scalability perspective, and code recommendations for performance improvements
  • Knowledge of best practices and IT operations in an always-up, always-available service
  • Strong ability to follow checklists with attention to detail and help build checklists to avoid errors in processes
  • Strong experience with system monitoring tools such as SolarWinds, AppDynamics, and Splunk
  • Strong background in Linux/Unix Administration (shell scripts)
  • Strong experience with automation/configuration management tools (Git, Jenkins, SaltStack, Puppet, Chef, Ansible or an equivalent)
  • Fluency in at least one scripting language (Python, Perl, Ruby or equivalent)
  • Ability to use a wide variety of open source technologies and cloud services
  • Experience with SQL and MySQL (NoSQL experience is a plus)
  • Integration of Git, Junit Testing and Selenium in continuous deployment
  • Strong in the use of continuous integration and delivery processes and tools
  • Bachelor’s degree preferred
  • General web development background preferred (Java/J2EE, JSP, HTML, CSS, JavaScript, Ajax, Spring)
  • 2+ years’ experience in the use of Maven, Ant or Gradle
  • Understanding of SRE philosophy, technologies, platforms and tools, SLA management, incident resolution, and automation
  • Understanding of container platforms such as Docker and Kubernetes
  • Experience with working on at least one public cloud technologies like (AWS, GCP, Azure)

Nice To Haves

  • Prefer history of working in an e-commerce, scale-up, and fast-paced environment
  • Experience deploying / supporting international sites a big plus
  • Experience with application source code analysis via a quality gateway in the build process (i.e. SonarQube)
  • Experience in engineering solutions for metrics gathering/publishing and event collection/correlation across distributed architectures, automation, monitoring, intelligent alerting, random fault injection (Chaos Engineering), and self-healing

Responsibilities

  • Improve service reliability through root cause analysis, blameless postmortems, and using code to prevent or respond to problem recurrence
  • Participate in the entire software lifecycle including design, delivery, measurement, and learning
  • Design, write, ship, and motivate the creation of software and systems to increase product reliability and organizational efficiency
  • Supports the entire software lifecycle by assisting the architects with reviewing designs, creating platforms and frameworks, capacity planning, and chaos testing
  • Maintain service health through monitoring and follow-the-sun incident response by working with the hosting companies, Basis team, and developers
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service