Director of Platform & Reliability Engineering

Forge Global•New York, NY

1d•$235,000 - $245,000•Hybrid

About The Position

The Director of Platform & Reliability Engineering will lead a critical engineering organization responsible for the systems, services, and operational practices that enable Forge to build and run secure, scalable, and highly reliable products. This leader will oversee Platform Engineering, Cloud Operations, and Site Reliability Engineering, setting the vision for how internal platforms, cloud infrastructure, developer enablement, and production operations evolve to support the company's growth. This is an exciting opportunity for a seasoned technical and people leader who can operate strategically while remaining close enough to architecture, delivery, and operations to guide strong technical decision-making. The ideal candidate will build high-performing teams, drive engineering excellence, improve reliability and developer productivity, and partner closely with product and engineering leaders to ensure Forge's platform capabilities scale with the business.

Requirements

8+ years of software engineering experience, including significant time leading infrastructure, platform, cloud, or reliability-focused teams.
5+ years of people leadership experience, including leading managers and building high-performing engineering organizations.
Deep experience with cloud infrastructure, infrastructure as code, observability, incident response, and modern platform engineering practices.
Strong technical judgment in distributed systems, production operations, service reliability, and scalable engineering architecture.
Experience defining engineering strategy, driving cross-functional alignment, and translating business priorities into platform and infrastructure roadmaps.
Bachelor's degree in Computer Science or a closely related field, or equivalent practical experience.
Excellent communication and stakeholder management skills, with the ability to influence technical and non-technical leaders.

Nice To Haves

Experience in FinTech, financial services, or another regulated industry.
Experience leading organizations through cloud modernization, platform standardization, or large-scale reliability transformations.
Strong familiarity with Kubernetes, container platforms, CI/CD systems, and infrastructure automation tooling.
Experience building developer platforms and self-service infrastructure capabilities that improve engineering productivity.
Experience at growth-stage companies where balancing scale, speed, and reliability is essential.

Responsibilities

Building and scaling the capabilities, teams, and operating model needed to deliver resilient infrastructure and strong internal engineering platforms across Forge.
Leading and developing the Platform Engineering, Cloud Engineering, and Site Reliability Engineering teams, including organizational design, hiring, coaching, and performance management.
Defining and executing the strategy for internal platforms, cloud infrastructure, reliability engineering, observability, and developer enablement.
Driving improvements in availability, performance, scalability, security, and operational maturity across production systems.
Establishing and evolving standards for incident management, service ownership, operational readiness, disaster recovery, and post-incident learning.
Partnering with product and software engineering leaders to create paved-road solutions that improve delivery speed, reliability, and developer experience.
Leading cloud capacity, cost, and architecture planning to ensure infrastructure investments align with business priorities and engineering demand.
Creating and monitoring meaningful service level objectives (SLOs), operational metrics, and executive-level reporting for reliability and platform health.
Guiding technical architecture decisions for cloud platforms, CI/CD, infrastructure automation, and runtime environments with a focus on resilience and maintainability.
Promoting a culture of accountability, continuous improvement, automation, and operational excellence across the engineering organization.
Collaborating with security, compliance, and risk partners to ensure platform and infrastructure practices meet the needs of a regulated business.