Head of Platform Operations

Claritas Rx
9h$220,000 - $250,000

About The Position

We are seeking a Head of Platform Operations to lead the operational excellence of our SaaS application, analytics platform, DevOps functions, and data platform. Reporting to the CTO, you will own the reliability, scalability, and security of the production systems that power our data insights. This strategic role requires a hands-on leader who can manage both onshore and offshore teams, oversee incident management and compliance (HIPAA/SOC2), and drive the evolution of our DevOps infrastructure. You will partner closely with Product Management and Engineering Directors to ensure our platform meets the rigorous demands of the healthcare ecosystem. This role focuses on building, running, and evolving the tools and processes that keep our data platform healthy and compliant, including custom AWS CloudWatch metrics, dashboards, alarms, incident management workflows in Atlassian Jira, and operational SOPs. You will lead, build, and govern an offshore operations team and will be responsible for access management and PHI audit compliance. You will regularly produce post-mortem writeups, operational review reports, and customer-facing documentation. Anticipated travel associated with the role is expected to be approximately up to 20%.

Requirements

  • BS degree in Computer Science, Math, or related STEM fields, or comparable body of work.
  • 10+ years of experience in Software Engineering or Operations, with significant experience in a leadership role.
  • Proven track record of managing SaaS platforms and DevOps functions in a high-compliance environment (healthcare preferred).
  • Deep expertise in AWS Cloud Native Development (MySQL, Athena, PySpark, Lambda, CloudWatch).
  • Strong programming background with 10+ years of experience in languages such as Python, TypeScript, or Go.
  • Advanced SQL skills (10+ years) including performance tuning and complex data modeling.
  • Experience with containerization and modern CI/CD tools.
  • Experience managing 24/7 on-call rotations for production systems and handling severe customer-impacting incidents.
  • Ability to define and govern offshore teams, ensuring operational capability without compromising security.
  • Strong communication skills for cross-functional collaboration and customer-facing reporting.

Responsibilities

  • Own all operational activities for the SaaS platform and data services post-deployment, ensuring 24/7 availability and performance.
  • Lead, build, and govern the operations team, including managing offshore specialists and defining access controls to prevent PHI exposure.
  • Establish and maintain operational SLAs and KPIs, producing regular operational review reports and customer-facing post-mortem writeups.
  • Drive the strategy for Cloud Native infrastructure, overseeing CI/CD pipelines, container management (Docker/Kubernetes), and deployment automation.
  • Develop and maintain comprehensive monitoring and alerting systems (using tools like AWS CloudWatch) to ensure rapid issue detection and system health.
  • Manage the "Run" function of the platform, ensuring zero-downtime deployments and robust infrastructure-as-code practices.
  • Partner with Data Engineering to monitor ETL/ELT pipelines. Ensure data arrives on time and meets quality standards (freshness/completeness).
  • Manage the operational health of data and application infrastructure tools (e.g., Snowflake, Airflow, Redshift etc.), managing upgrades and capacity planning.
  • Technically implement and police data access controls across all layers of the platform. Ensure developers and analysts have the access they need without violating "Least Privilege" policies.
  • Implement and manage incident response workflows (e.g., in Jira), ensuring clear remediation processes and documentation.
  • Oversee access reviews and compliance reporting to support HIPAA and SOC2 policies, serving as the primary owner for audit readiness.
  • Create and maintain SOPs, runbooks, and operational procedures to institutionalize best practices.
  • Identify recurring operational issues and partner with engineering to drive reliability and automation improvements.
  • Gather, analyze, and report on operational metrics, incident trends, and system health.
  • Execute the technical controls required for SOC2, HIPAA, and ISO audits. Ensure evidence is collected automatically and continuously.
  • Manage the patching cadence. When security scans find vulnerabilities, you are responsible for prioritizing and deploying the fixes across the fleet.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service