Staff Platform Engineer

Rezdy•,

About The Position

We’re hiring a Staff DevOps Engineer to join Manifest, a new product being built in a high-autonomy, fast-moving environment. This is a hands-on, staff-level role for someone who can own critical infrastructure, improve the developer experience, and partner closely with product engineers, DevOps leadership, and technical leads. We’re looking for someone who can operate production systems, but also design the guardrails, patterns, and platform capabilities that allow the team to move faster and more safely over time. This role is a strong fit for someone who enjoys working close to the product team, understands the realities of building in a startup-like environment, and can bring structure, reliability, and technical depth to a fast-moving team.

Requirements

Deep production experience with AWS, especially services such as ECS/Fargate, RDS/Aurora PostgreSQL, VPC networking, load balancing, IAM, KMS, Secrets Manager, CloudFront, WAF, and related managed services.
Experience designing and operating systems that serve a global user base, seamless multi-region availability, and disaster recovery procedures.
Treats reliability, scalability, performance, and observability as a first-class design constraint, building these into designs from the start rather than bolting them on later.
Strong infrastructure-as-code experience. Pulumi with TypeScript is ideal, but deep experience with Terraform or another mature IaC approach is also valuable.
Strong operational knowledge of PostgreSQL, including performance investigation, connection pooling, backups, replication, locking, migrations, and safe schema-change patterns.
Experience designing and maintaining CI/CD systems, ideally with GitHub Actions, OIDC-based cloud authentication, container builds, environment promotion, required checks, and deployment gates.
Experience supporting containerized production workloads and improving deployment safety, rollback strategies, and runtime reliability.
Strong observability and incident response experience, including metrics, logs, traces, alerting, dashboards, runbooks, and post-incident learning.
The ability to work effectively in ambiguity, make pragmatic tradeoffs, and communicate clearly with both infrastructure specialists and product engineers.
A track record of raising the engineering bar through reusable patterns, documentation, automation, mentoring, and thoughtful technical leadership.

Responsibilities

Own and evolve the infrastructure that supports Manifest, including AWS environments, networking, compute, data services, observability, CI/CD, and operational tooling.
Work with Pulumi and TypeScript to define, maintain, and improve infrastructure as code across the platform.
Support and improve our containerized application platform, including deployment pipelines, rollback mechanisms, and runtime configuration.
Help operate and harden our data infrastructure, including connection pooling, backups, disaster recovery, replication, and safe schema-change practices.
Partner with engineers to improve the reliability and safety of releases, including database migrations, deployment workflows, environment management, and production readiness checks.
Improve CI/CD workflows so that builds, tests, infrastructure changes, and deployments are fast, reliable, and easy for engineers to understand.
Lead observability and incident readiness work, including alerting, dashboards, SLOs, runbooks, incident response practices, and post-incident follow-up.
Help ensure the platform is secure, cost-conscious, and maintainable as the product scales.
Mentor engineers on infrastructure, operations, reliability, and production ownership.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume