Dev Ops Engineer (Site Reliability)

NintendoRedmond, WA
304d$102,500 - $164,000Onsite

About The Position

Nintendo of America Inc. is seeking a Dev Ops Engineer (Site Reliability) to join their Site Reliability Engineering (SRE) team within the NOA IT: Enterprise Architecture department. The SRE team supports operations and engineering for several mission-critical platforms such as Nintendo eShop, Grand Prix, Nintendo Developer Portal, and Magento. The role involves designing, building, and maintaining reliable and scalable applications on Linux servers, both on-prem and in the cloud, to support various applications. The engineer will partner with development teams, collaborate with peers, and lead efforts in requirements gathering, testing, and operational issue resolution.

Requirements

  • Minimum of five (5) years of related experience in various system development technologies, patterns and practices.
  • Experience with automation methodologies such as continuous integration/delivery (CI/CD).
  • Experience participating in or leading the planning and execution of small to moderately complex projects.
  • Proficiency in one or more programming languages (Python); experience writing, documenting, bundling and publishing code for reuse by others is extremely helpful.
  • Proficiency in the setup, configuration, maintenance, and upgrading of one or more server operating system families (Linux, Windows, etc.).
  • Proficiency with managing GitHub (Administration, GitHub Actions, etc.) preferred.
  • Proficiency with cloud-based web services platforms (AWS) and services (EKS, ECS, CloudFront, Lambda, S3, etc.).
  • Proficiency with containerization and orchestration technologies (Kubernetes, Docker, etc.).
  • Some experience with SDLC processes (code review, release management, etc.) and automation of same.
  • Some experience with networking equipment, protocols (TCP/IP, SSL, etc.) and troubleshooting tools.
  • Bachelor of Science degree in Computer Science, Computer Engineering, Electrical Engineering, Information Technology, Information Systems, Industrial Engineering, or related field; or equivalent combination of education and experience.

Responsibilities

  • Designs, builds and maintains reliable and scalable applications on Linux servers, both On-prem and cloud.
  • Partners with development teams by providing infrastructure assistance and guidance from the early phases of product development.
  • Collaborates with engineers in peer teams to develop solutions that meet reliability goals, leveraging automation and process.
  • Submits software fixes for deficiencies within the area of expertise or operational responsibility.
  • Owns, drives and is accountable for KPIs pertaining to reliability, resiliency, durability, and uptime contributing to defining SLOs.
  • Leads and directs efforts for requirements gathering, building, and implementing test plans, performing quality reviews, and fixing operational issues.
  • Troubleshoots, evaluates, and resolves system and application challenges.
  • Builds automation to reduce the cost of errors, improve CI/CD processes, and increase developer efficiency.
  • Performs peer review of solutions developed by others, ensuring that best practices and internal standards are being followed.
  • Runs Infrastructure as a code with Puppet/CloudFormation/Terraform/CDK.
  • Participates in 24x7 on call cycle to support and troubleshoot products and solutions in multiple environments.
  • Develops metrics, dashboards, and alerts in New Relic using Terraform.

Benefits

  • Medical insurance
  • Dental insurance
  • Vision insurance
  • 401(k)
  • Paid time off
  • Potential for a semi-annual discretionary performance bonus
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service