Staff Site Reliability Engineer

Manifold AI•Newton, MA

About The Position

Manifold is looking for a Staff Site Reliability Engineer (SRE) to work at the intersection of AI, data infrastructure, and life sciences. In this high-impact role, you will help design, build, and operate the multi-account AWS infrastructure that acts as the foundation for Manifold’s platform. As a Staff SRE, you'll work closely with Platform engineering and Professional Services teams to ensure Manifold’s internal and customer-facing infrastructure is secure, scalable, and observable. SREs are expected to be fluent across a wide-ranging tech stack, and comfortable working in a high-pressure, multi-threaded environment. They are also expected to balance a bias towards automation with more pragmatic approaches and have an intuitive understanding of the tradeoffs involved. The infrastructure you deploy and manage will play a key role in Manifold's success and help our customers bring life-changing medicines to patients faster.

Requirements

7+ years in infrastructure, DevOps, SRE, or platform engineering roles with increasing scope and autonomy.
You are a leader / doer who can establish operational standards and drive technical direction while also staying hands-on.
Deep, hands-on cloud (AWS, GCP, or Azure) experience, hands-on application development experience, and comfortable in troubleshooting application issues.
Significant infrastructure-as-code (Terraform) experience.
Strong CI/CD (Github Action) experience.
Familiarity with identity systems (Okta, Auth0), containerized deployments (Docker, ECS, Packer) and networking tooling (Tailscale, WireGuard).
Working knowledge of data platform services, such as Snowflake, Airflow, dbt, and PostgreSQL.
Comfort managing complex, multi-account environments where customer isolation, security boundaries, and regulatory requirements add real constraints.
You move fast, make sound calls with the information available, and reset quickly when things don't go as planned.
And you have strong bias towards pragmatic, incremental process automation.
You possess a track record of improving developer experience and reducing CI/CD friction.
The ability to effectively and positively collaborate with platform engineer, professional services, and customer IT groups.
You've built AI into how you work. You're curious about new tools, resourceful in applying them, and have concrete examples of how AI changed your output.
You've done your homework on what it means to accelerate life sciences research and you can articulate why that mission matters to you.

Responsibilities

Design and maintain infrastructure as code solutions, thinking holistically about topology and component dependencies. You will have full responsibility for everything from Terraform plans to production observability.
Automate customer infrastructure deployments, including multi-account provisioning, database setup, workflow orchestration, and application bootstrapping.
Manage CI/CD pipelines, including build reliability, test stability, and deployment automation for Manifold services.
Troubleshoot complex production issues across infrastructure, data, and application layers.
Leverage LLM to the fullest extend to minimize toil and and manage date to date operations.
Networking and security / compliance work required for highly regulated environments.