Site Reliability Engineer

Specialized Dental Partners•Franklin, TN

9h•$110,000 - $135,000

About The Position

Specialized Dental Partners is seeking highly skilled Site Reliability Engineer (SRE) to ensure the reliability, scalability, and resilience of our hybrid on-premises and Azure-based platforms. This focuses on designing systems and automation that keep critical clinical, infrastructure, and data platforms available, observable, and recoverable. The SRE partners closely with Infrastructure, Network, Server, Microsoft, Security, and Clinical Systems Engineering teams to reduce operational risk and eliminate classes of failure through engineering. This is a hands-on engineering role focused on reliability, automation, and systemic improvement - not reactive firefighting. About Specialized Dental Partners Specialized Dental Partners is one of the nation’s leading dental support organizations for Endodontic, Periodontic, and Oral Surgery practices. With more than 250 practice locations and over 430 doctors across the United States, Specialized Dental Partners empowers specialists to focus on exceptional patient care while the organization delivers tailored business, operational, and strategic support.

Requirements

6+ years experience in a technology role
2+ years of operational cloud infrastructure implementation
2+ years working with Terraform/Ansible/”Other” Automated deployments
2+ years experience in PowerShell, Scripting, Similar tooling
2+ years providing operational IT with live environments
3+ years supporting a SQL driven data infrastructure
4+ years in communication-centric business role
1+ year of Healthcare IT experience (HIPAA Aligned)

Nice To Haves

Bachelor’s degree in Computer Science, Information Technology, or related field
Strong direct ownership and responsibility for client experience
Azure Data Factory (ADF) or data pipeline reliability experience
Hybrid Cloud and On-Prem infrastructure support capabilities
Knowledgeable in performing Root Cause Analysis (RCA) processes
Documentation, standard setting, and lifecycle ownership
Proven ability to communicate technical decisions to technical and non-technical stakeholders
Experience with HIPAA security standards and aligning systems with least-priv methodology
Troubleshooting, diagnostics, and systemic resolution process experiences
Fabric-driven data environment infrastructure experience
Familiarity with SLIs, SLOs, and error budgeting
Hands-on experience with core infrastructure components on-prem
Automation-first solution development

Responsibilities

Reliability Engineering & Platform Health
Design and implement reliability patterns across hybrid platforms (on-prem + Azure)
Define and measure service reliability using SLIs, SLOs, and error budgets
Oversee configuration, upgrades, integrations, and vendor coordination
Improve availability, performance, and recoverability of clinical and enterprise systems
Identify and eliminate systemic reliability risks before they cause incidents
Automation & Toil Reduction:
Automate repetitive operational tasks using PowerShell, scripting, and IaC
Partner with engineering teams to build self-healing and self-service capabilities
Reduce manual intervention and operational toil through design and tooling
Own and drive initiatives to proactively implement automation methodology to group
Monitoring, Observability, and Alerting:
Design and maintain monitoring, logging, and alerting strategies
Ensure alerts are actionable, meaningful, and tied to business impact
Evolve alerting flows into full automations across standardized tooling
Improve visibility across infrastructure, applications, and data pipelines
Incident Response & Postmortems:
Participate in and lead incident response for high-severity outages
Drive blameless postmortems and root cause analysis
Ensure incidents result in durable engineering improvements
Partner with leadership on incident communications and lessons learned
Data & Integration Reliability:
Support reliability of data pipelines and integrations, including ADF workloads
Ensure data movement, transformations, and dependencies are observable and recoverable
Bring evolutionary techniques to pipelines and integrations as core member of ADF infrastructure
Partner with data and analytics teams to improve pipeline resilience
Security, Compliance, & Change Safety:
Ensure reliability practices align with security and compliance requirements (HIPAA-aligned)
Participate in change reviews and risk assessments
Design guardrails that allow teams to move fast without breaking systems