Site Reliability Engineer

NusanoWest Valley City, UT
1dOnsite

About The Position

The Site Reliability Engineer supports the deployment and operation of Linux-based, containerized microservice clusters that power a state-of-the-art particle accelerator used for medical radioisotope production supporting cancer care, scientific research, and advanced technology applications. This role works directly with physical accelerator systems, where software reliability, infrastructure resilience, and real-time data availability are critical to continuous operations. Reporting to the Director of CSEE, the Site Reliability Engineer collaborates closely with engineers and scientists to integrate software and hardware systems responsible for accelerator control, data acquisition, and data management. This hands-on role focuses on automation, observability, and operational stability within an on-premise production environment, helping ensure reliable platform operations and secure storage, monitoring, and retrieval of accelerator time-series and imaging data essential to system performance.

Requirements

  • Bachelor's degree in Computer Science or related field
  • 3-5 years of professional experience in a DevOps/SRE capacity
  • Hands-on experience working in a production environment with on-premise hardware
  • Proficiency in deploying and configuring containers and container orchestration platforms (Docker, Kubernetes)
  • Experience implementing and maintaining real-time observability tools (Grafana, Prometheus, or similar)
  • Experience deploying and configuring CI/CD pipelines (GitLab CI, Jenkins, Argo, or similar)
  • Understanding of network fundamentals including enterprise routing and switching
  • Strong Linux command-line skills and system administration experience
  • Working knowledge of cybersecurity best practices

Nice To Haves

  • Exposure to or familiarity with the EPICS framework
  • Experience maintaining Linux systems (Red Hat, Rocky, CentOS)
  • Programming experience in Python and/or C/C++ with objectoriented principles
  • Experience with databases and object stores (ElasticSearch, MariaDB, S3, MongoDB)Experience in highly regulated or safety-critical environments

Responsibilities

  • Respond to and resolve production issues and outages under the guidance of senior team members
  • Support observability initiatives by implementing and maintaining monitoring dashboards and alerting systems
  • Develop and maintain infrastructure-as-code scripts, automation tools, system images, virtual machines, and containers to support accelerator operations
  • Assist in identifying and documenting system vulnerabilities and potential points of failure
  • Document processes, procedures, and troubleshooting steps in the group knowledge base
  • Deploy and configure vendor-supplied software utilities and packages built on the EPICS framework

Benefits

  • Comprehensive medical, dental, and vision coverage for employees and their eligible dependents
  • 401(K) Retirement Plan
  • Company-paid life insurance & AD&D coverage
  • Company-paid short-term and long-term disability coverage
  • High-Deductible Health Plan (HDHP) option with company funded Health Savings Account (HSA)
  • Healthcare Flexible Spending Account (FSA)
  • Dependent Care Reimbursement Account (DCRA)
  • Voluntary Life Insurance
  • Voluntary benefits such as Critical Illness, Accident, Hospital, and Pet Insurance
  • Employee Assistance Program (EAP)
  • Vacation, Sick Time, and Holidays
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service