Reliability Engineer (On-Premise Focus)

ProsciaPhiladelphia, PA
2hOnsite

About The Position

As a Reliability Engineer, you will own the reliability, performance, and operational excellence of Proscia’s on-premise installations at customer sites. Our platform powers high-resolution digital pathology and AI-assisted workflows in clinical and research environments, often running on customer-managed infrastructure. You’ll ensure these deployments are stable, performant, secure, and continuously improving. This is a hands-on role focused on on-premise container based deployments, systems performance, and real-world operational problem solving in complex customer environments.

Requirements

  • Deep hands-on experience deploying and operating containerized applications using container tools such as Docker and Docker Compose in production environments.
  • Strong Linux systems expertise (process management, networking, storage, security hardening, performance tuning)
  • Expert troubleshooting skills in distributed systems across application, container, and infrastructure layers.
  • Experience in enterprise networking technologies, and the ability to troubleshoot and suggest corrections in customer infrastructure.
  • Familiarity with operating software in customer-managed or on-premise environments.
  • Experience supporting data-intensive systems, ideally involving large image files or compute-heavy workloads.
  • Working knowledge of observability practices (logs, metrics, tracing) and pragmatic monitoring approaches in non-cloud-native environments.
  • Comfort working directly with customers or customer-facing teams to resolve high-impact issues.
  • Demonstrated AI fluency: hands-on experience using tools like Claude, ChatGPT, GitHub Copilot, or similar AI systems to enhance productivity, automate tasks, and solve technical problems.
  • A mindset aligned with Proscia’s values: ownership, speed, simplification, and a willingness to challenge the status quo.

Nice To Haves

  • Experience with healthcare or regulated environments.
  • Exposure to Kubernetes (for hybrid or future-state deployments).
  • Experience with infrastructure automation or configuration management tools.
  • Familiarity with database performance tuning for large datasets.
  • Experience supporting GPU-enabled workloads.

Responsibilities

  • Deploy, configure, and support Proscia’s container based application stack in on-premise customer environments.
  • Own system reliability across customer installations, including uptime, performance, backup/recovery, and upgrade workflows.
  • Diagnose and resolve production incidents, performing deep root cause analysis across application, container, host, storage, and networking layers.
  • Optimize performance for large image datasets and AI workloads running on customer-managed compute infrastructure.
  • Improve installation automation, configuration management, and repeatability across diverse environments.
  • Develop and refine monitoring, logging, and alerting patterns appropriate for customer-hosted deployments.
  • Collaborate closely with Engineering, Customer Success, and Support to translate field learnings into product and operational improvements.
  • Document best practices and create operational playbooks for internal teams and customers.
  • Leverage AI tools (e.g., Claude, code assistants, automation frameworks) to streamline troubleshooting, scripting, and operational workflows.

Benefits

  • In addition to competitive pay, we ensure everyone on our team is supported with savings, schedule, and insurance options that promote long-term health and personal growth.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service