Software Engineering Group Manager - Site Reliability Center

PNC•Pittsburgh, PA

1d•$146,300 - $298,870•Onsite

About The Position

At PNC, our people are our greatest differentiator and competitive advantage in the markets we serve. We are all united in delivering the best experience for our customers. We work together each day to foster an inclusive workplace culture where all of our employees feel respected, valued and have an opportunity to contribute to the company’s success. As a Software Engineering Group Manager for PNC's Site Reliability Engineering Center, you will work within PNC's Information Technology Group and be located at one of our IT Hubs: Cleveland, Ohio; Birmingham, Alabama; Pittsburgh, Pennsylvania; Dallas, Texas; Denver, Colorado or Phoenix, Arizona and manage the daylight shift. The Site Reliability Center (SRC) is focused on establishing a culture of operational excellence by ensuring infrastructure, platforms, and applications adhere to SRC onboarding standards that improve reliability, enable proactive issue resolution, and reduce customer impact. This role supports the vision of building a collaborative technology organization across application, infrastructure, and security teams to deliver a stable, reliable, and secure environment. The ideal candidate will help improve service performance, strengthen operational resiliency, and advance automation and observability initiatives that enhance the overall customer experience. We’re looking for a Senior Group Manager to lead mission-critical production operations and site reliability engineering (SRE) for enterprise platforms that power real customer experiences. In this role, you won’t just keep systems running—you’ll shape how reliability is built, scaled, and continuously improved. This is an opportunity to lead high-impact teams, influence engineering strategy, and drive the evolution toward resilient, automated, and customer-first platforms.

Requirements

8+ years of related experience and 5+ years of management experience.
Proven leadership experience in Production Operations, SRE, or Infrastructure Engineering.
Deep expertise in incident, problem, and change management within complex environments.
Passion for building reliable, scalable, customer-centric platforms.
Track record of improving operational metrics and leading high-performing teams.
Strong executive presence and communication skills.
Experience with OCP under infrastructure (Linux/Windows, OCP), MongoDB, Cassandra under databases (Oracle, SQL, MongoDB, Cassandra) and working knowledge of Elasticsearch, Redis, MQ and Kafka is a plus.

Nice To Haves

Application Development
Business Management
Customer Solutions
Design
Group Problem Solving
Leadership Management
Process Improvements
Release Management
ShiftPlanning
Site Reliability Engineering
Software Solutions
User Experience (UX) Design

Responsibilities

Lead during moments that matter; own major incident response for high-impact (P1/P2) events, ensuring rapid resolution and clear communication. Guide teams through complex outages, acting as a calm, decisive leader during critical moments. Improve incident response maturity, reducing downtime and customer impact.
Provide technical leadership in production support; serve as an escalation point for complex production issues; guide troubleshooting across: applications, infrastructure (Linux/Windows), databases (Oracle, SQL), middleware and integrations; ensure efficient log, metric, and system analysis; oversee batch/ETL monitoring and recovery processes; foster strong collaboration across engineering, infrastructure, and vendor teams.
Drive root cause and real fixes; champion a culture of accountability through deep root cause analysis; eliminate repeat issues by driving permanent, systemic solutions and turn data and trends into actionable improvements.
Shape reliability at scale; Define and evolve reliability strategy across availability, resiliency, and performance; lead improvements in uptime, MTTR, and overall system health and partner with engineering to embed reliability into system design.
Modernize operations; advance observability with best-in-class monitoring, alerting, and event management; leverage tools like Dynatrace, BigPanda, and Logscale to enable proactive detection and drive automation to reduce manual effort and create self-healing systems.
Ensure safe and reliable change; oversee change and release governance to enable fast and safe deployments; improve change success rates, reduce production defects, and lead post-release reviews that fuel continuous improvement.
Lead a Global 24x7 Operation; manage distributed teams supporting critical systems around the clock; create seamless handoffs and strong operational discipline across regions and elevate team performance, engagement, and growth. Build a trusted, compliant environment; ensure alignment with enterprise governance, audit, and regulatory standards and strengthen risk management, controls, and operational documentation.

Benefits

medical/prescription drug coverage (with a Health Savings Account feature)
dental and vision options
employee and spouse/child life insurance
short and long-term disability protection
401(k) with PNC match
pension and stock purchase plans
dependent care reimbursement account
back-up child/elder care
adoption, surrogacy, and doula reimbursement
educational assistance, including select programs fully paid
a robust wellness program with financial incentives
maternity and/or parental leave
up to 11 paid holidays each year
9 occasional absence days each year, unless otherwise required by law
between 15 to 25 vacation days each year, depending on career level; and years of service