Platform Engineer, AI Platform

OpturaSan Francisco, CA
Remote

About The Position

Optura is healthcare’s AI orchestration platform. We help healthcare organizations transform disconnected AI pilots into a unified, enterprise-scale program that delivers measurable value. Our platform enables teams to design, execute, and monitor intelligent agents that drive automation, insights, and action, while providing the control and observability needed to scale safely. Built for real-world complexity, Optura supports multiple model providers, integrates seamlessly with existing infrastructure, and offers both SaaS and self-hosted options. Our mission: revolutionize how healthcare deploys and operationalizes AI in production. We’re looking for a Senior Platform Engineer to design, build, and operate the core services that power Optura’s AI Platform. In this role, you will own systems end-to-end. From model and agent orchestration to routing, reliability, and observability. You will partner closely with product and application teams to deliver secure, scalable, HIPAA-aware services. You will play a critical role in shaping the foundation that enables customers to safely deploy AI in real-world healthcare environments.

Requirements

  • 5+ years of software engineering experience with strong proficiency in Python and TypeScript
  • 2+ years of experience operating AI systems in production (agentic workflows, RAG, orchestration, or similar)
  • Experience with operating in Cloud environments, including the use of containers/Kubernetes (EKS or ECS) and Terraform
  • Experience designing and operating distributed systems with a focus on performance optimization and deep debugging
  • Experience with observability systems (metrics, tracing, logging) and on-call ownership

Nice To Haves

  • Experience working in healthcare or other regulated industries, including HIPAA or PHI-handling practices
  • Experience with LLMOps, including prompt management, evaluation frameworks, guardrails, and cost and latency tuning
  • Experience building or operating model gateways, traffic shaping, multi-provider routing, and caching at scale

Responsibilities

  • Build core platform services in Python and TypeScript for orchestration, routing, model gateways, retrieval-augmented generation (RAG), and evaluation pipelines
  • Leverage AI-assisted development tools (e.g., Claude, Cursor) alongside tests, linters, and benchmarks to improve velocity and quality
  • Own services from design through deployment, including SLO creation, dashboards, runbooks, and operational readiness
  • Improve reliability by optimizing system latency, availability, performance, and cost; lead and participate in incident response and postmortems
  • Develop production AI capabilities including guardrails, prompt and version management, offline and online evaluations, and multi-provider integrations
  • Build and maintain data and storage systems including vector search (pgvector, Pinecone, OpenSearch), caching, and Postgres/RDS patterns
  • Implement security and compliance best practices aligned to HIPAA, including RBAC, audit logging, least-privilege access, and secrets management

Benefits

  • Health, dental, and vision insurance
  • Generous paid time off
  • Opportunities for professional growth and development
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service