Senior Site Reliability Engineer, Infrastructure

Vultr

1d•$125,000 - $135,000•Remote

About The Position

Vultr is seeking a highly skilled and experienced Senior Site Reliability Engineer to build and own the observability pipeline for the physical and provisioning infrastructure that powers Vultr's global datacenter footprint. The ideal candidate is a builder and operator with deep experience in infrastructure observability, strong datacenter hardware knowledge, and the ability to turn raw telemetry into actionable visibility for the teams that keep our datacenters running. This is a highly visible role in a high-growth technology company, which will require designing and building observability pipelines from the ground up, working across multiple internal stakeholder teams, and establishing the foundation for how Vultr monitors its physical infrastructure at scale. This is your opportunity to join our fast growing team and leave your mark on Vultr and the future of Cloud Infrastructure.

Requirements

5+ years of experience in site reliability, platform, or infrastructure engineering in a production environment.
Hands-on experience building and operating observability pipelines including metrics, logs, and alerting using Grafana, Loki, Mimir, or equivalent tooling.
Working knowledge of datacenter hardware telemetry protocols including Redfish, IPMI, and/or SNMP.
Strong Linux fundamentals and operational experience in production infrastructure environments.
Demonstrated experience with infrastructure-as-code and configuration management tooling (Terraform, Ansible, Chef or similar).
Strong cross-functional communication skills and experience delivering tooling for operational stakeholder teams.

Responsibilities

Design and build the observability pipeline for datacenter infrastructure including CDUs, PDUs, bare metal servers, and provisioning workflows, collecting telemetry via Redfish, IPMI, SNMP, and OpenTelemetry.
Own the full stack from data collection through to visualization and alerting in Grafana, Loki, and Mimir.
Build dashboards and alerting that are actionable and meaningful for stakeholder teams including Datacenter Ops, SysAdmin, Network, and Provisioning.
Establish standards and patterns for how datacenter infrastructure telemetry is collected, stored, and visualized across Vultr's global footprint.
Partner closely with stakeholder teams to understand their operational needs and translate them into observable, measurable signals.
Drive infrastructure-as-code practices across the observability pipeline to ensure consistency, repeatability, and maintainability.

Benefits

100% company-paid insurance premiums for employee medical, dental and vision plans.
401(k) plan that matches 100% up to 4%, with immediate vesting
Professional Development Reimbursement of $2,500 each year
11 Holidays + Paid Time Off Accrual + Rollover Plan
Increased PTO at 3 year and 10 year anniversary
1 month paid sabbatical every 5 years
Anniversary Bonus each year
$500 stipend for remote office setup in first year + $400 each following year
Internet reimbursement up to $75 per month
Gym membership reimbursement up to $50 per month
Company paid Wellable subscription