Reliability Engineer (On-Premise Focus)

Proscia•Philadelphia, PA

2h•Onsite

About The Position

As a Reliability Engineer, you will own the reliability, performance, and operational excellence of Proscia’s on-premise installations at customer sites. Our platform powers high-resolution digital pathology and AI-assisted workflows in clinical and research environments, often running on customer-managed infrastructure. You’ll ensure these deployments are stable, performant, secure, and continuously improving. This is a hands-on role focused on on-premise container based deployments, systems performance, and real-world operational problem solving in complex customer environments.

Requirements

Deep hands-on experience deploying and operating containerized applications using container tools such as Docker and Docker Compose in production environments.
Strong Linux systems expertise (process management, networking, storage, security hardening, performance tuning)
Expert troubleshooting skills in distributed systems across application, container, and infrastructure layers.
Experience in enterprise networking technologies, and the ability to troubleshoot and suggest corrections in customer infrastructure.
Familiarity with operating software in customer-managed or on-premise environments.
Experience supporting data-intensive systems, ideally involving large image files or compute-heavy workloads.
Working knowledge of observability practices (logs, metrics, tracing) and pragmatic monitoring approaches in non-cloud-native environments.
Comfort working directly with customers or customer-facing teams to resolve high-impact issues.
Demonstrated AI fluency: hands-on experience using tools like Claude, ChatGPT, GitHub Copilot, or similar AI systems to enhance productivity, automate tasks, and solve technical problems.
A mindset aligned with Proscia’s values: ownership, speed, simplification, and a willingness to challenge the status quo.

Nice To Haves

Experience with healthcare or regulated environments.
Exposure to Kubernetes (for hybrid or future-state deployments).
Experience with infrastructure automation or configuration management tools.
Familiarity with database performance tuning for large datasets.
Experience supporting GPU-enabled workloads.

Responsibilities

Deploy, configure, and support Proscia’s container based application stack in on-premise customer environments.
Own system reliability across customer installations, including uptime, performance, backup/recovery, and upgrade workflows.
Diagnose and resolve production incidents, performing deep root cause analysis across application, container, host, storage, and networking layers.
Optimize performance for large image datasets and AI workloads running on customer-managed compute infrastructure.
Improve installation automation, configuration management, and repeatability across diverse environments.
Develop and refine monitoring, logging, and alerting patterns appropriate for customer-hosted deployments.
Collaborate closely with Engineering, Customer Success, and Support to translate field learnings into product and operational improvements.
Document best practices and create operational playbooks for internal teams and customers.
Leverage AI tools (e.g., Claude, code assistants, automation frameworks) to streamline troubleshooting, scripting, and operational workflows.

Benefits

In addition to competitive pay, we ensure everyone on our team is supported with savings, schedule, and insurance options that promote long-term health and personal growth.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume