Staff Infrastructure Engineer

Salient•San Francisco, CA

98d•Onsite

About The Position

We're looking for a Staff Infrastructure Engineer to architect and own the systems that power Salient at scale. This is a high-impact IC role focused on infrastructure engineering - you'll define how we build, deploy, and operate the infrastructure that processes millions of real financial transactions and customer interactions daily. You'll set the technical direction for reliability, scalability, and developer velocity across the stack to make sure the platform can support whatever we're building next.

Requirements

5+ years of software engineering experience, with 2+ years at the senior or staff level in infrastructure/platform roles, working on large-scale distributed systems.
Deep expertise in cloud platforms (AWS or GCP) - compute, networking, storage, IAM, and cost optimization.
Expert in infrastructure-as-code, with a strong track record of building scalable automation systems.
Extensive experience owning and scaling Kubernetes and CI/CD systems in high-throughput, production environments.
Track record of building and operating high-availability, high-throughput distributed systems with mature observability practices.
Strong technical communication - able to document architecture clearly and influence engineering decisions across teams.

Nice To Haves

Background in security.
Exposure to serving AI/ML workloads.
Combination of big tech and startup experience.

Responsibilities

Lead architectural decisions and technical reviews for infrastructure-critical initiatives.
Design, build, and own the cloud infrastructure (AWS/GCP) that runs Salient - from compute and networking to storage and observability.
Develop scalable harnesses that enable coding agents to operate reliably without compromising system stability or code quality.
Partner closely with the modeling team to optimize the serving and performance of GPU-intensive workloads.
Drive reliability and performance across the stack by defining SLOs, building robust monitoring and alerting, and leading incident response and postmortems.
Own developer platform investments that materially improve engineering velocity, including CI/CD, deployment tooling, environments, and internal infrastructure abstractions.
Establish infrastructure best practices, patterns, and standards as a technical authority across the engineering org.
Identify and reduce technical debt across infrastructure systems, with a focus on long-term scalability and operational health.