Director, Infrastructure & SRE

TailorCareMontreal, QC
Hybrid

About The Position

The Director of Infrastructure & SRE owns the function end-to-end: reliability, security, scalability, and operational governance of TailorCare’s infrastructure, plus the team that delivers it. You will be a peer to the Director of Software Engineering, Director of Data Engineering, and Director of Data Science, own the Infrastructure & SRE scorecard in front of the executive team, and lead vendor escalations with Salesforce, AWS, and Cresta, among others, at the Director level. This is a player-coach role. In year one you will spend roughly 60% of your time hands-on (writing Terraform, leading incidents, doing architecture work) and 40% building the team and the practice. As the team scales, that ratio shifts toward leadership, but you will never stop being technical. This is not a slideware role. We are not hiring a manager who reviews architecture diagrams from a distance. We are hiring an operator who codes, runs incidents, owns the platform, and ships.

Requirements

  • 10+ years in Infrastructure Engineering, SRE, or DevOps, with 3+ years in a senior IC or tech lead role and 2+ years directly managing engineers
  • Recent hands-on technical work (within the last 12 to 18 months) in Terraform, AWS, and production incident response
  • Track record of hiring, leveling, and developing infrastructure or SRE engineers
  • Deep AWS expertise (VPC, IAM, ECS/EKS, Lambda, RDS, DynamoDB, S3, API Gateway, WAF, Connect)
  • Production Terraform experience at scale (modules, state management, multi-environment)
  • Hands-on with observability stacks (CloudWatch, Datadog, Grafana, or equivalents)
  • Demonstrated experience standing up SRE practices: SLOs, on-call, incident management, blameless postmortems
  • Experience operating in a HIPAA or comparably regulated environment (PCI, SOC 2 Type II, HITRUST, FedRAMP)
  • CI/CD pipeline design (GitHub Actions, GitLab CI, or equivalent)
  • Ability and willingness to travel up to 10% as needed for onsite meetings, team collaboration, and company events.

Nice To Haves

  • Salesforce platform integration and operational experience
  • Amazon Connect or comparable contact center telephony platforms
  • Data platforms (Databricks, Snowflake, Fivetran)
  • HITRUST certification participation (e1 or r2)
  • AI/LLM-assisted operations tooling
  • Experience scaling an infrastructure function in a healthcare or other regulated growth-stage company

Responsibilities

  • Converge all AWS resources to Terraform; eliminate manual provisioning
  • Establish reproducible environments (dev, staging, production) with proper isolation and parity
  • Standardize CI/CD pipelines across all engineering teams
  • Define and operate SLOs, SLIs, and error budgets for all production systems (web/mobile applications, Salesforce, data processing, telephony stack)
  • Build observability (metrics, logs, traces, alerting) across AWS, Salesforce, telephony/omni-channel, and Cresta integrations
  • Stand up the infrastructure on-call rotation, incident management, and post-incident review discipline, including RCAs
  • Own uptime, MTTR, and incident-volume trends as published metrics
  • Design and implement a tested DR strategy with documented RPO/RTO commitments
  • Validate recovery procedures on a recurring cadence
  • Align DR posture with HITRUST and HIPAA expectations
  • Stabilize Salesforce, telephony/omni-channel, and Cresta integrations; close persistent gaps in skills-based routing, warm transfers, and telephony data parity
  • Partner with Data Engineering on the reliability of data ingest paths (Fivetran, SFTP, S3) and Salesforce bulk API flows.
  • Translate Security & Compliance policy into enforced infrastructure controls: IAM, encryption (at rest and in transit), network segmentation, secrets management, audit logging
  • Partner with Security & Compliance on HITRUST evidence, audit readiness, and remediation
  • Own vulnerability management across cloud and application layers
  • Fix DNS, SPF, DKIM, DMARC, and IP reputation to resolve spam-folder deliverability impacting patient and operational communications
  • Own all TailorCare domain and email infrastructure
  • Build and maintain test, staging, and ephemeral environments engineers actually use
  • Reduce cycle time and remove infrastructure friction from the SDLC
  • Establish self-service tooling so engineers ship without filing tickets
  • Hire, level, develop, and retain the Infrastructure & SRE team
  • Own the function’s MBR contribution: scorecard, risks, decisions needed
  • Partner with Engineering, Data, Product, and Security & Compliance leadership as a peer
  • Other duties as assigned

Benefits

  • medical, dental, vision, life, and disability insurance
  • wellness resources
  • employer HSA contribution
  • 401k plan that includes employer matching
  • paid parental leave
  • generous paid time off (PTO)
  • holiday plans
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service