DevOps Engineer II

doTERRA•Pleasant Grove, UT

About The Position

At doTERRA we encourage all employees to seek out opportunities that will expand their skill set. We strive to help achieve personal career goals by providing opportunities of growth and movement throughout the company. Job Description: Site Reliability Engineering (SRE) applies software engineering techniques and discipline to production operations to attack major problems and fix them for good. SRE also participates in architecture planning to help insure the architecture will scale and have low to zero maintenance / technical debt. SRE will assign tasks to developers regarding technical debt items and help prioritize those tasks by working with their development leads. SRE is on call to keep software services available and operating fast. SRE provides performance reports, tuning recommendations, and code evaluations for systems performance improvements.

Requirements

Strong ability to problem-solve performance related issues ahead of time, help provide feedback for architecture designs from a scalability perspective, and code recommendations for performance improvements
Knowledge of best practices and IT operations in an always-up, always-available service
Strong ability to follow checklists with attention to detail and help build checklists to avoid errors in processes
Strong experience with system monitoring tools such as SolarWinds, AppDynamics, and Splunk
Strong background in Linux/Unix Administration (shell scripts)
Strong experience with automation/configuration management tools (Git, Jenkins, SaltStack, Puppet, Chef, Ansible or an equivalent)
Fluency in at least one scripting language (Python, Perl, Ruby or equivalent)
Ability to use a wide variety of open source technologies and cloud services
Experience with SQL and MySQL (NoSQL experience is a plus)
Integration of Git, Junit Testing and Selenium in continuous deployment
Strong in the use of continuous integration and delivery processes and tools
Bachelor’s degree preferred
General web development background preferred (Java/J2EE, JSP, HTML, CSS, JavaScript, Ajax, Spring)
2+ years’ experience in the use of Maven, Ant or Gradle
Understanding of SRE philosophy, technologies, platforms and tools, SLA management, incident resolution, and automation
Understanding of container platforms such as Docker and Kubernetes
Experience with working on at least one public cloud technologies like (AWS, GCP, Azure)

Nice To Haves

Prefer history of working in an e-commerce, scale-up, and fast-paced environment
Experience deploying / supporting international sites a big plus
Experience with application source code analysis via a quality gateway in the build process (i.e. SonarQube)
Experience in engineering solutions for metrics gathering/publishing and event collection/correlation across distributed architectures, automation, monitoring, intelligent alerting, random fault injection (Chaos Engineering), and self-healing

Responsibilities

Improve service reliability through root cause analysis, blameless postmortems, and using code to prevent or respond to problem recurrence
Participate in the entire software lifecycle including design, delivery, measurement, and learning
Design, write, ship, and motivate the creation of software and systems to increase product reliability and organizational efficiency
Supports the entire software lifecycle by assisting the architects with reviewing designs, creating platforms and frameworks, capacity planning, and chaos testing
Maintain service health through monitoring and follow-the-sun incident response by working with the hosting companies, Basis team, and developers

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume