Staff Software Engineer, Scaling AI Systems

Abridge•San Francisco, CA

1d•$228,000 - $290,000•Remote

About The Position

Abridge’s services and engineering team are in hyperscale mode. We are looking for experienced software engineers to join our team and help improve the performance, stability, and scalability of our software by multiples. This is a distributed systems oriented role and is approximately 80% software focused and 20% cloud infrastructure focused. You will help us build load testing and chaos engineering into our development cycle, leverage observability and profiling tools to identify performance bottlenecks and resolve them, work with diverse teams to help rehome their applications onto more scalable platforms, and ensure a smooth ride as we hyperscale our application adoption in the healthcare space. You may be embedded with application teams for weeks or months. The platform we are building needs to maximize both engineering velocity and security, will be under tremendous scale, and presents many opportunities to leverage creativity, autonomy, and leadership to take things 0 to 1. This is a unique opportunity in the industry to rapidly grow your career in a rapidly growing company leveraging the best of emerging technologies.

Requirements

10+ years of software engineering experience focused on distributed systems or tooling, with an interest in engineering enablement and software scaling.
Experience as a back-end engineer focused on system performance and scalability.
Experience reducing latency in software by multiples through leveraging observability and profiling tools and deriving great pleasure from doing so.
Experience building on Kubernetes and scaling compute services on Kubernetes; experience with related cloud native technologies including ArgoCD, Argo Rollouts, Istio, etc.
Comfortable implementing and securing services in Google Cloud Platform with Infrastructure as Code, including GCP Projects, VPC Networks, Google Kubernetes Engine, and IAM Roles, Groups and policies. Candidates without GCP experience but who have experience with Kubernetes are encouraged to apply.
Experience building software with backend languages (e.g. Python, GoLang, Node, and Rust).
Experience monitoring distributed systems with Prometheus, OpenTelemetry Collector, and Grafana (or something similar), including metrics collection, visualization, alerting, and using observability data to drive performance optimizations.
Passion for engineering enablement and solving software and distributed systems scaling challenges under pressure.
Must be willing to travel up to 10%

Responsibilities

Leverage load testing, chaos engineering, and other test practices to identify performance and latency bottlenecks across all of our systems, and make changes to application code to resolve them.
Drive software changes that can rehome applications at the code level onto new infrastructure (run times, event driven infrastructure, databases, and more) in order to dramatically improve scalability as well as enable multi-tenant deployments.
Identify and implement software configuration changes and performance tuning parameters that will dramatically improve performance and scalability.
Build developer tools and software modules that help engineers build code faster and more effectively with more enablements to the entire engineering organization.
Work with the Platform team to develop, and application teams to adopt, emerging elements of our internal developer platform, such as service templates and self-serve infrastructure.
Work with application teams to establish and adopt SLOs and error budgets, and drive better metrics for application health that can drive automated canary releases, improved health monitoring, and better engineering practices.
Uplevel our ability to respond to incidents by improving observability, runbooks, and incident response muscle across the organization.
Evangelize, document, and train the engineering team on the solutions being built and uplevel them on cloud native design strategies and tools.
Be a public evangelist for Abridge in the global platform engineering community, including conferences, open source, and research as we pioneer new AI-first cloud-native-first security-first implementations at scale.

Benefits

14 paid holidays
flexible PTO for salaried employees
accrued time off for hourly employees
Medical, Dental, and Vision coverage for all full-time employees and their families.
Generous HSA Contribution: If you choose a High Deductible Health Plan, Abridge makes monthly contributions to your HSA.
Generous paid parental leave for all full-time employees.
Family Forming Benefits: Resources and financial support to help you build your family.
401(k) Matching: Contribution matching to help invest in your future.
Personal Device Allowance: Tax free funds for personal device usage.
Pre-tax Benefits: Access to Flexible Spending Accounts (FSA) and Commuter Benefits.
Lifestyle Wallet: Monthly contributions for fitness, professional development, coworking, and more.
Mental Health Support: Dedicated access to therapy and coaching to help you reach your goals.
Sabbatical Leave: Paid Sabbatical Leave after 5 years of employment.
Competitive compensation and equity grants for full time employees.