Staff/Senior Software Engineer, Infrastructure

Recruiting From Scratch•San Francisco, NY

2d•Hybrid

About The Position

Our client is a category-defining AI healthcare company building cutting-edge infrastructure to power real-time clinical insights from medical conversations. Their platform leverages advanced machine learning and large-scale distributed systems to transform unstructured healthcare data into actionable intelligence for providers. With significant funding, rapid enterprise adoption, and a world-class team spanning engineering, AI research, and clinical expertise, the company is scaling aggressively. This is a rare opportunity to join a high-growth environment solving deeply meaningful problems at the intersection of healthcare, AI, and infrastructure.

Requirements

9+ years of backend or infrastructure engineering experience
Strong experience building and scaling distributed systems in production environments
Deep expertise in performance optimization, system scalability, and reliability engineering
Proficiency in languages such as Python or TypeScript
Hands-on experience with cloud-native technologies (e.g., Kubernetes, GCP, Terraform)
Track record of improving system performance, reducing latency, and enabling scale
Experience working across teams to influence architecture and infrastructure decisions
Strong ownership mindset with the ability to operate in complex, fast-scaling environments
Excellent communication skills and ability to translate technical concepts across teams

Nice To Haves

Experience with load testing, chaos engineering, and performance benchmarking
Background in developer platforms, internal tooling, or platform engineering
Familiarity with SLOs, error budgets, and production reliability frameworks
Experience supporting multi-tenant or high-throughput systems
Prior experience in high-growth startups or scaling environments
Interest in AI/ML infrastructure or healthcare technology

Responsibilities

Design and optimize large-scale distributed systems to improve performance, reliability, and scalability
Build and integrate load testing and chaos engineering practices into CI/CD pipelines
Identify latency and performance bottlenecks using observability, profiling, and monitoring tools, and implement solutions at the code level
Drive architectural changes to migrate and scale applications across modern infrastructure (event-driven systems, cloud runtimes, databases)
Partner with engineering teams to re-architect applications for multi-tenant, high-scale environments
Develop internal developer tools and platform capabilities to improve engineering velocity
Define and implement SLOs, error budgets, and system health metrics to support reliable deployments
Improve incident response systems, observability, and operational excellence across teams
Collaborate cross-functionally and embed with teams to guide infrastructure adoption and best practices
Contribute to technical thought leadership through documentation, training, and potentially external community engagement

Benefits

Competitive salary and equity package
Opportunity to work on high-scale, mission-critical systems in a rapidly growing company
Hybrid work environment in SF or NYC
Work alongside top-tier engineers, researchers, and healthcare professionals
Significant career growth opportunity in a hyperscaling environment
Exposure to cutting-edge AI, cloud-native, and distributed systems challenges

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume