Senior Engineering Manager, Engineering Operations

Ridgeline•Reno, NV

36d•$200,000 - $235,000•Hybrid

About The Position

As the Senior Engineering Manager for Engineering Operations at Ridgeline, you will lead a high-impact team that ensures our platform is reliable, observable, and cost-effective at scale. This role is responsible for defining and executing our strategy around incident response, FinOps, and system-wide telemetry, enabling Ridgeline's engineering and business leaders to make critical decisions with confidence. Your work will improve visibility, reduce friction, and unlock proactive insights across the organization. You'll leverage cutting-edge technologies-including AI tools like GitHub Copilot and ChatGPT-to elevate operational excellence and drive efficiency throughout Ridgeline's technical ecosystem. At Ridgeline, how we work matters as much as what we build. Ridgeliners act like owners, choose growth over comfort, and communicate with transparency. We assume positive intent, bias toward action, and bring solutions-not just problems. We celebrate wins, learn from setbacks, and thrive in a resilient, collaborative, high-performing culture.

Requirements

10+ years of experience in SRE, infrastructure, or technical operations, including 3-6 years in a leadership role
Expertise in observability platforms like Datadog, Prometheus, ELK, or OpenTelemetry
Experience integrating technical telemetry with business metrics and cost models (e.g., cost-per-customer, MTTR, unit metrics)
Proven success scaling incident management frameworks and post-mortem processes
Proficiency with SQL, data modeling, or BI tools like Looker or Tableau
Strong collaboration skills and the ability to communicate technical insights to executive audiences
Calm, effective communicator who performs well under pressure and in incident response environments
Passion for continuous improvement, resilience, and mentorship

Nice To Haves

Prior experience in the FinTech or SaaS industry
Familiarity with AI/ML solutions in observability and operations
Experience managing infrastructure in a cloud-native environment (e.g., AWS, Kubernetes)

Responsibilities

Lead and evolve Ridgeline's observability and telemetry ecosystem to ensure critical metrics are trustworthy, actionable, and widely adopted
Define and execute the company-wide incident management strategy, enabling rapid response and continuous learning
Drive cost optimization and forecasting by scaling our FinOps practice with integrated usage and financial telemetry
Collaborate with Site Reliability Engineering (SRE) to create cross-system observability standards and ensure consistency in logs, metrics, tracing, and cost data
Build a unified metrics platform that combines operational, financial, and organizational performance data for real-time executive decision-making
Identify, automate, and eliminate high-frequency operational tasks using AI, reducing toil and increasing focus on continuous improvement
Define, track, and communicate KPIs for system reliability, operational efficiency, and infrastructure cost-effectiveness
Mentor and grow a diverse team of engineers, fostering a culture of ownership, learning, and transparency