Senior Infrastructure Engineer

Northbeam

53d•$180,000 - $210,000•Remote

About The Position

Northbeam is fundamentally a data product - the whole company. We don’t sell shoes, ads, or games. We sell data: quality integrations with a variety of platforms, fresh and reliable data pulls, robust data ingest APIs, correct aggregations, and algorithmic insights on top of that data, all packaged up in a user-facing application. As an Infrastructure Engineer at Northbeam, you will work with a small engineering team to create a scalable, observable, reliable platform for data ingestion, processing, and serving. Responsibilities will range widely and evolve, as they tend to at startups. The following all fall within the Infrastructure team’s remit: CI/CD and dev infrastructure Security DR policies SSL cert management and DNS Infrastructure as Code Capacity planning and budgeting Observability Scaling plans and automation Guidance for best use of cloud infrastructure … and more Success in this role will require significant existing experience, interest in continuously learning and iterating on existing solutions, a penchant for automation over “clickops” and one-offs, and ability to effectively collaborate with engineers, auditors, product managers, and more. You’ll wear many hats, juggle diverse tasks, and probably find yourself learning new skills on the fly – while also teaching a few. If you thrive on variety and love diving in, you’ll fit right in. Here’s a taste of what might come up on any given week: Improve Observability: work with engineering to iterate on alerts and dashboards, help standardize and automate monitoring practices. Automate All the Things: Pre-configured services, networking rules, access grants, the whole deal. DevEx Par Excellence: Improve and iterate on “golden paths” to simplify the work for developers to create new services that have great CI/CD, deployment environments, secrets, routing, and beyond. Tame Noisy Alerts: Hunt down the root causes of noisy alerts and tweak configurations so we only hear from the system when it matters. “Just see if it goes away” is not an acceptable reaction to an alert. Not on your watch. Security Meets Sanity: Stay on top of patching, pen-testing, training, work to ensure the system is secure and meets the requirements of customers and auditors. Capacity Management: Regularly audit cloud spend to identify and act on savings opportunities or misallocated resources; manage long-term compute reservations and monitor utilization; propose and implement improvements. You will work with great people who have done this many times before. You will teach them some new tricks, and maybe learn some old ones. You’ll be a key player in keeping our platform reliable, efficient, and scalable. We’ll laugh, we’ll cry, we’ll tell stories of yore. If this sounds like your kind of chaos, we’d love to hear from you.

Requirements

Cloud Expertise: Solid hands-on experience with cloud computing and a knack for automating infrastructure.
Programming: Very comfortable developing (and debugging!) automation in Python, and bridging between software engineers’ service code and problems to how they manifest in the infrastructure and metrics.
Automation: You are expert level at Terraform and Atlantis; you can discuss differences and nuances of TF, Puppet, Chef, CF, and others at length.
Observability Know-How: Familiarity with monitoring and observability tools (e.g., Grafana, Prometheus, Datadog).
Problem Solving: A track record of resolving on-call issues and staying cool under pressure.
Collaboration: Comfortable working cross-functionally with engineering, data, product, and management teams.
Documentation Skills: You can write clear, concise technical documentation that other humans can actually follow.

Nice To Haves

Experience working in marketing, e-commerce, or ad-tech.
Basic familiarity with SQL and ETL processes.
Understanding of database admin process and automation (user provisioning, backups, cloning and replication, index management, etc).
GCP specific experience
Kubernetes experience

Responsibilities

CI/CD and dev infrastructure
Security
DR policies
SSL cert management and DNS
Infrastructure as Code
Capacity planning and budgeting
Observability
Scaling plans and automation
Guidance for best use of cloud infrastructure
Improve Observability: work with engineering to iterate on alerts and dashboards, help standardize and automate monitoring practices.
Automate All the Things: Pre-configured services, networking rules, access grants, the whole deal.
DevEx Par Excellence: Improve and iterate on “golden paths” to simplify the work for developers to create new services that have great CI/CD, deployment environments, secrets, routing, and beyond.
Tame Noisy Alerts: Hunt down the root causes of noisy alerts and tweak configurations so we only hear from the system when it matters. “Just see if it goes away” is not an acceptable reaction to an alert. Not on your watch.
Security Meets Sanity: Stay on top of patching, pen-testing, training, work to ensure the system is secure and meets the requirements of customers and auditors.
Capacity Management: Regularly audit cloud spend to identify and act on savings opportunities or misallocated resources; manage long-term compute reservations and monitor utilization; propose and implement improvements.