Staff Platform Engineer

Abridge•San Francisco, CA

1d•$228,000 - $290,000•Hybrid

About The Position

Abridge was founded in 2018 with the mission of powering deeper understanding in healthcare. Our AI-powered platform was purpose-built for medical conversations, improving clinical documentation efficiencies while enabling clinicians to focus on what matters most—their patients. Our enterprise-grade technology transforms patient-clinician conversations into structured clinical notes in real-time, with deep EMR integrations. Powered by Linked Evidence and our purpose-built, auditable AI, we are the only company that maps AI-generated summaries to ground truth, helping providers quickly trust and verify the output. As pioneers in generative AI for healthcare, we are setting the industry standards for the responsible deployment of AI across health systems. We are a growing team of practicing MDs, AI scientists, PhDs, creatives, technologists, and engineers working together to empower people and make care make more sense. We have offices located in the Mission District in San Francisco, the SoHo neighborhood of New York, and East Liberty in Pittsburgh. The Role Abridge’s services and engineering teams are in hyperscale mode. We are looking for experienced Staff Platform Engineers to join our team and help scale our cloud infrastructure, developer platform, and operational maturity in kind. You’ll work on a centralized Platform team whose work spans platform architecture, developer enablement, infrastructure reliability, adoption, and ongoing support of existing tooling and software. This role is approximately 80% infrastructure focused and 20% application software focused. You will help us evolve our infrastructure stack into a scalable multi-tenant and multi-cloud platform, drive secure-by-default cloud infrastructure practices, build and manage modular Terraform platforms, and help define the long-term architecture and operational standards for infrastructure running in production at scale. You’ll also help shape and integrate developer platform capabilities including service templates, canary releases, feature flagging, load testing, CI/CD pipelines, observability tooling, and self-service infrastructure workflows. The platform we are building needs to maximize engineering velocity, reliability, scalability, and security while operating under tremendous growth and regulatory requirements. This role presents opportunities to leverage deep technical expertise, systems thinking, autonomy, and organizational leadership to take platform capabilities from 0 to 1 and scale them across the company. This is a unique opportunity to help shape the future of AI-first, cloud-native, security-first infrastructure at scale.

Requirements

10+ years of software and infrastructure engineering experience, including significant experience operating infrastructure-as-code platforms in cloud-first organizations.
Experience designing and operating large-scale Kubernetes platforms and scaling compute services on Kubernetes; experience with related cloud-native technologies including ArgoCD, Argo Rollouts, Istio, etc.
Deep understanding of Kubernetes platform architecture and operations, including workload isolation, autoscaling, networking, service mesh management, ingress patterns, observability, upgrades, and multi-tenant cluster design.
Experience designing and maintaining CI/CD systems for both infrastructure-as-code deployments and application delivery workflows. (Terragrunt, Atlas, ArgoCD, Octopus Deploy, Travis CI, etc.)
Experience building scalable infrastructure-as-code platforms using Terraform and related tooling, including modular architectures, remote state management, policy enforcement, deployment orchestration, and reusable infrastructure patterns.
Experience with monitoring and observability tooling and practices (metrics, logs, traces) and their management at scale. Experience with major observability platforms such as Grafana, Datadog, Honeycomb, etc.
Comfortable implementing and securing services in Google Cloud Platform as infrastructure-as-code, including GCP Projects, VPC Networks, Google Kubernetes Engine, IAM Roles, Groups, policies, and secure networking patterns.
Experience designing secure-by-default infrastructure including least-privilege access controls, workload identity, network segmentation, secret management, auditability, and compliance-oriented platform controls.
Strong operational instincts and experience debugging complex distributed systems, leading incident response efforts, and improving reliability through automation and observability.
Experience balancing developer experience, platform governance, operational reliability, and organizational scalability in fast-growing engineering environments.
Experience with backend languages (e.g. Python, GoLang, Node, Rust).
Up-to-date on industry best practices and tools, and enjoy learning new things.
Excited about being hands-on while also driving platform direction, architecture decisions, and operational maturity in a fast-moving and supportive environment.
Willing to pitch in wherever needed — as a fast-moving startup we need to do good work, quickly.
Demonstrates strong curiosity and a proactive interest in AI, actively exploring and applying emerging technologies.
This role has a rotational on-call schedule. You will have the opportunity to shape incident response practices, operational standards, and platform reliability strategy for the team and throughout the organization.

Nice To Haves

We value people who want to learn new things, and we know that great team members might not perfectly match a job description. If you’re interested in the role but aren’t sure whether or not you’re a good fit, we’d still like to hear from you.

Responsibilities

Design, build, and evolve cloud infrastructure platforms including networking, IAM, Kubernetes, databases, streaming and pubsub platforms, storage, distribution, observability, and more.
Lead the architecture and operational evolution of multi-tenant, multi-region, and multi-cloud infrastructure with strong reliability, scalability, and security boundaries.
Design and implement build pipelines, branching strategies, release management tooling, and self-service platform workflows that will serve an engineering organization that is rapidly growing in both size and operational complexity.
Design, implement, and scale secure-by-default cloud infrastructure practices including CI and deployment scans, least privileged access controls, auditing, policy enforcement, and maintaining SoC2 and HIPAA compliance.
Build reusable infrastructure abstractions, Terraform modules, golden paths, and developer platform capabilities that allow engineering teams to move quickly while maintaining operational consistency and governance.
Help advocate for, design, implement, and adopt fast and scalable application testing pipelines including end-to-end UI tests, hyperscale load tests, resiliency testing, and progressive delivery patterns.
Drive improvements in observability, operational readiness, incident response, SLO-driven reliability practices, and platform debuggability across the organization.
Bridge the gap between local development and production environments in a way that is seamless for engineers and maximizes engineering velocity, reliability, and security while minimizing quality issues arising from environment drift and configuration tangles.
Partner closely with engineering, security, and compliance teams to balance platform standardization with developer flexibility and evolving business requirements.
Influence infrastructure cost and capacity strategy by balancing reliability, scalability, performance, and operational efficiency across cloud environments.
Evangelize, document, mentor, and train the engineering team on the solutions being built and help uplevel the organization on cloud-native platform engineering strategies and operational excellence.
Be a public evangelist for Abridge in the global platform engineering community, including conferences, open source, and research as we pioneer new AI-first, cloud-native-first, security-first implementations at scale.

Benefits

Generous Time Off: 14 paid holidays, flexible PTO for salaried employees, and accrued time off for hourly employees
Comprehensive Health Plans: Medical, Dental, and Vision coverage for all full-time employees and their families.
Generous HSA Contribution: If you choose a High Deductible Health Plan, Abridge makes monthly contributions to your HSA.
Paid Parental Leave: Generous paid parental leave for all full-time employees.
Family Forming Benefits: Resources and financial support to help you build your family.
401(k) Matching: Contribution matching to help invest in your future.
Personal Device Allowance: Tax free funds for personal device usage.
Pre-tax Benefits: Access to Flexible Spending Accounts (FSA) and Commuter Benefits.
Lifestyle Wallet: Monthly contributions for fitness, professional development, coworking, and more.
Mental Health Support: Dedicated access to therapy and coaching to help you reach your goals.
Sabbatical Leave: Paid Sabbatical Leave after 5 years of employment.
Compensation and Equity: Competitive compensation and equity grants for full time employees.