Site Reliabiltiy Engineer

Harbor Compliance

About The Position

The Site Reliability Engineer is a senior-level technical leader responsible for the proactive design, implementation, and predictable management of our business-critical Linux infrastructure. You will collaborate cross-functionally with Software Development and technical stakeholders to execute resilient infrastructure strategies that support high-growth business goals. Success in this role is defined by the successful delivery of scalable technical solutions and the consistent maintenance of exceptional system performance and reliability.

Requirements

4–7 years of professional experience building and managing resilient, modern infrastructure within a fast-paced environment.
Expert-level proficiency in managing and troubleshooting Linux-based servers across multiple distributions.
Advanced capability in developing modular, reusable infrastructure templates using tools such as Terraform and Ansible.
Proven success in managing containerized workloads at scale using Kubernetes and Helm.
Extensive experience configuring and optimizing high-performance database environments, specifically MySQL.
Demonstrated ability to build robust, secure CI/CD deployment pipelines that include automated rollback and quality gates.
Strong technical documentation skills, including the creation of architectural diagrams, detailed specifications, and operational playbooks.
Ability to lead cross-functional projects independently while mentoring junior engineers and driving team-wide initiatives.

Nice To Haves

Deep understanding of observability platforms such as New Relic, Datadog, or Prometheus to measure and improve system reliability.
Expertise in designing secure cloud networking strategies including firewalls, VPNs, and identity management best practices.
Advanced scripting and programming proficiency in Python or similar languages to automate complex operational workflows.
Strategic insight into infrastructure ROI and the ability to align technical roadmaps with broad business priorities.
Practical knowledge of disaster recovery planning and the execution of failure-resilient system designs.

Responsibilities

Design and execute a comprehensive infrastructure strategy that proactively supports evolving business requirements and operational excellence.
Own the predictable delivery of high-complexity technical solutions through deep automation using Kubernetes and sophisticated CI/CD pipelines.
Maintain superior portal availability and system health by implementing advanced observability and distributed tracing strategies.
Lead high-severity incident response efforts and drive systemic improvements through insightful, blameless postmortem analysis.
Architect failure-resilient and self-healing infrastructure systems to ensure continuous operational stability and zero data loss.
Serve as the internal subject matter expert to influence software architecture decisions toward maximum scalability and performance.
Facilitate regular knowledge-sharing and training sessions to elevate technical standards and process predictability across the entire technology department.
Direct security initiatives and design secure networking strategies to maintain a high-standard protection framework for all client data and assets.