Systems Administrator

Blueprint TechnologiesRedmond, WA
4h$33 - $35Hybrid

About The Position

You will support a research-focused organization by designing, building, and maintaining High‑Performance Computing (HPC) clusters used for advanced scientific workloads. This position plays a key part in a major migration effort to transition HPC environments into a more secure, custom tenant. You will work hands‑on with Linux systems, automation tooling, and distributed compute environments while collaborating with researchers and engineering teams. This is an onsite‑hybrid role requiring presence in Redmond at least three days per week.

Requirements

  • Linux systems administration experience with strong proficiency in the Linux terminal.
  • Hands‑on experience with automation tools such as Python, Ansible, and Terraform.
  • Experience supporting distributed, multi‑user systems.
  • Bachelor’s degree in Computer Science, Engineering, or related technical field.
  • 2+ years of experience working with Linux‑based systems in production, lab, or research computing environments.
  • Experience debugging production or pre‑production systems (performance, reliability, or configuration issues).
  • Experience writing scripts or tools for automation, diagnostics, or operations.
  • Strong troubleshooting, analytical, communication, and organizational skills.
  • Ability to learn existing platforms and contribute to iterative improvements.

Nice To Haves

  • Experience with High‑Performance Computing (HPC)—either supporting HPC platforms or developing parallel/accelerated applications.
  • Familiarity with HPC schedulers, especially Slurm.
  • Experience with HPC offerings in Azure.
  • Experience with containers/microservices technologies.
  • Background supporting large‑scale networking or storage systems in Linux environments.

Responsibilities

  • Build, deploy, and maintain HPC cluster infrastructure, including compute, storage, and networking components.
  • Develop automation for cluster deployment, configuration, and lifecycle management using tools such as Python, Ansible, and Terraform.
  • Diagnose and resolve platform-level issues affecting performance, reliability, and workload execution.
  • Participate in the full DevOps lifecycle, including code development, code reviews, testing, and production operations.
  • Validate HPC platforms for readiness, internal security compliance, and research team usage.
  • Interact directly with users to troubleshoot simulation and workload issues.
  • Maintain clear, accurate documentation for architecture, deployment processes, and operational procedures.
  • Support a major migration of HPC systems into a specialized secure environment.

Benefits

  • Medical, dental, and vision coverage
  • Flexible Spending Account
  • 401k program
  • Competitive PTO offerings
  • Parental Leave
  • Opportunities for professional growth and development
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service