Senior Site Reliability Engineer

Tactile MedicalMinneapolis, MN

About The Position

The Senior Site Reliability Engineer (SRE) at Tactile Medical is responsible for ensuring reliability, observability, and operational excellence across the company's digital products and internal platforms. This encompasses the digital therapy ecosystem (mobile apps, React portals, clinician tools), the .NET API layer, CosmosDB backed data platforms, WooCommerce commerce components, Azure Service Bus integrations, and the cloud infrastructure supporting regulated medical device workflows. The role operates at the intersection of DevOps, cloud operations, compliance, and product support, ensuring production systems meet uptime expectations and regulatory requirements while enabling rapid iteration for the Digital Solutions and Software Engineering teams. The SRE will establish and mature the operational reliability strategy, including incident management, performance monitoring, infrastructure automation, and continuous improvement, with a specific focus on supporting a regulated medical device and digital health environment. The systems managed by this role directly impact patients and device connectivity, making reliability and quality essential to business continuity and patient outcomes.

Requirements

  • Bachelor’s degree in Computer Science, Information Technology, or related field.
  • 5+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure roles.
  • Hands-on experience with: Cloud platforms (AWS, GCP, Azure), Container orchestration (Kubernetes, Nomad), Monitoring tools (Prometheus, Grafana, Datadog), CI/CD pipelines and automation frameworks.
  • Strong understanding of CI/CD pipelines, version control workflows, and automated deployment practices, including the ability to build and maintain secure, reliable pipelines in Azure DevOps or GitHub Actions.
  • Knowledge of how APIs work, their endpoints, request methods, authentication mechanisms, response formats, error handling, and rate limiting.
  • Familiarity with web services technologies, including HTTP/HTTPS protocols, JSON, XML, and data serialization/deserialization methods.
  • Understanding of various data formats and protocols used in API communication, such as JSON, XML, CSV, and protocol buffers. Knowledge of data transformation techniques to convert data between different formats.
  • Experience with integration platforms, preferably Dell Boomi, and ability to use these platforms to build, deploy, and manage integrations between systems.
  • Basic understanding of databases and SQL (Structured Query Language) and ability to query databases, retrieve data, and perform data manipulations as part of integration processes.
  • Understanding of security principles and best practices for API integration, including authentication, authorization, encryption, and data privacy regulations (e.g., GDPR, HIPAA).
  • Goal oriented with solid planning and time management skills.
  • Excellent communication, follow through, attention to detail, documentation and collaboration skills.
  • Ability to think critically and use strong problem-solving skills.
  • A team-oriented personality with the initiative to accomplish goals.
  • Able to simultaneously manage many details and priorities.

Nice To Haves

  • Master’s degree or certifications (e.g., Azure, Kubernetes, SRE) are a plus.
  • Proven experience in regulated industries (healthcare, finance, etc.) is highly desirable.

Responsibilities

  • Serve as the operational owner for the production environment supporting Tactile’s digital solutions.
  • Lead incident response processes, coordinating with Digital, IT, Marketing, Operations, and Product Support teams.
  • Participate in on-call rotation and oversee escalation pathways for Tier 2 & 3 technical support.
  • Ensure post incident documentation aligns with regulated quality expectations (e.g., CAPA inputs, RCA documentation in accordance with ISO 13485 / QMS processes).
  • Build and maintain end to end observability across: Native and hybrid mobile applications, Patient, partner and internal portals, Device connectivity & data ingestion services, Payment and WooCommerce commerce flows, .NET backend services and Azure integrations.
  • Build dashboards and alerts in Datadog, Azure Monitor (or preferred tools) to detect anomalies.
  • Conduct database level investigations for usage analytics, reliability metrics, and management level reporting (patient usage trends, connectivity patterns, error rates).
  • Lead infrastructure automation using Terraform and Azure DevOps.
  • Automate monitoring configuration, system audits, log standards, and compliance-related reporting.
  • Collaborate with IT Security and Compliance to maintain operational readiness for HIPAA and internal QMS audits.
  • Support the transition of legacy components toward more scalable and modern cloud patterns where needed.
  • Define and maintain SLOs, SLIs, and reliability metrics that balance innovation velocity with platform stability.
  • Work closely with developers to embed reliability into CI/CD, code quality, test coverage, and deployment patterns.
  • Lead post incident reviews and manage the continuous reliability improvement backlog.
  • Offer guidance on resilient architecture decisions, retry patterns, failure modes, and performant API design.
  • Act as a reliability subject matter expert across Digital Solutions, Product, Engineering, Product Support, and Security.
  • Ensure production change control aligns with quality and regulatory expectations.
  • Support compliance documentation for software releases, infrastructure changes, and security controls.

Benefits

  • Medical, dental and vision benefits
  • Retirement benefits
  • Employee stock purchase plan
  • Paid time off
  • Parental leave
  • Family medical leave
  • Volunteer time off and additional leave programs
  • Life insurance
  • Disability coverage
  • Other life and work wellness benefits and discounts
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service