Infrastructure Engineer

Knock•New York, NY

52d•Remote

About The Position

Knock is looking for an infrastructure engineer to join their small but growing platform team. The platform team at Knock is responsible for building, scaling, and maintaining the core services and infrastructure that run Knock. The role offers a high degree of ownership and autonomy in improving the Knock platform, starting with foundational infrastructure. The team is engineer-led and obsesses over the reliability and availability of their service. Knock is committed to building an inclusive and equitable team culture and particularly encourages applications from underrepresented communities. They acknowledge that candidates may not perfectly match the description and encourage those who feel they are a good fit to apply and share their unique experiences.

Requirements

4+ years experience as a DevOps engineer or similar in a startup or mid-sized company working with complex systems that operate at scale.
Experience working in and on production Kubernetes clusters using infrastructure as code (Terraform, Pulumi, or Cloudformation).
Experience working on complex AWS deployments (multi-account, complex VPC structure to support EKS, EKS experience).
Experience operating and scaling different database technologies (Aurora Postgres, Mongo, and ClickHouse preferred).
Some past experience or familiarity operating and scaling different queues and streams across SQS, Kinesis, Kafka or similar.
Strong problem-solving skills with a focus on reliability, scalability, and performance.
Strong communications skills, with the ability to work in a fully distributed, remote-first team.
Familiarity with AI tools like Cursor, Claude Code, Codex, or similar.

Responsibilities

Adopting a Terraform-backed EKS cluster, modernizing & maintaining it for elastic scale, reliability, performance, security, etc.
Troubleshooting Postgres performance, queues of every shape and size, and developing plans for scaling them 10x to 100x.
Identifying and correcting scaling issues before they affect customers by relying on and improving telemetry and traces in Datadog, AWS Cloudwatch, and Honeycomb. This includes getting into the codebase to fix issues when necessary.
Maintaining and improving upon the company's >99.95% uptime track record.
Supporting the product engineering team in moving fast to deliver customer value by improving developer experience through canaries, faster cycle time, blue/green deploys, etc.
Joining on-call rotations on a schedule with the rest of the engineering team.
Communicating changes and bringing the rest of the team along, often in the form of runbooks & internal documentation.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume