Site Reliability Engineer Principal

PNCPittsburgh, PA
Onsite

About The Position

At PNC, our people are our greatest differentiator and competitive advantage in the markets we serve. We are all united in delivering the best experience for our customers. We work together each day to foster an inclusive workplace culture where all of our employees feel respected, valued and have an opportunity to contribute to the company’s success. As a Site Reliability Engineer Principal within PNC's Technology organization, you will be based in Phoenix, AZ or Pittsburgh, PA. PNC will not provide sponsorship for employment visas or participate in STEM OPT for this position. Leads in identifying and establishing ways of stabilizing environments and sites while assessing opportunities to drive engineering stability through the analytics and metrics. Consults on technical issues and defines appropriate remediation to help shape the future technical direction of our stack. Leads in the development and implementation of multiple automated monitoring and alerting sites to ensure the availability and performance of critical applications to support business strategy. Engages with cross-functional teams to define and improve scalability and reliability metrics. Engages in testing strategy approaches and results, and root cause analysis efforts, resolving underlying issues, driving continuous improvement in incident management process, and reducing mean time to resolution. Evaluates and limits risk and vulnerabilities during the software engineering process by consistently employing industry best-practices. Displays an innovative approach to apply modern principles, methodologies, and tools to advance business initiatives and capabilities. Provides technical guidance, mentoring and support to colleagues on solution development and Site Reliability Engineering. PNC Employees take pride in our reputation and to continue building upon that we expect our employees to be: Customer Focused - Knowledgeable of the values and practices that align customer needs and satisfaction as primary considerations in all business decisions and able to leverage that information in creating customized customer solutions. Managing Risk - Assessing and effectively managing all of the risks associated with their business objectives and activities to ensure they adhere to and support PNC's Enterprise Risk Management Framework.

Requirements

  • Open Telemetry Expertise:
  • Implementing distributed tracing, metrics, and logging using OTel APIs, SDKs, and Collectors.
  • Configuration and deployment of the Open Telemetry Collector (processors, receivers, exporters).
  • Experience with OTLP (Open Telemetry Protocol).
  • Observability & Monitoring Tools:
  • Grafana LGTM stack (Loki, Grafana, Tempo, Mimir).
  • Prometheus, ELK stack (Elasticsearch, Logstash, Kibana), Jaeger, Splunk, Dynatrace or Datadog.
  • Cloud-Native & Infrastructure:
  • Kubernetes (K8s): Deep understanding of containers, deployment, and management.
  • Cloud: AWS (EKS, CloudWatch), Azure (AKS, Azure Monitor), or GCP.
  • Infrastructure as Code (IaC): Terraform, Ansible, or Helm.
  • Programming & Scripting:
  • Proficiency in languages like Go, Java, Python, or .NET (for instrumenting applications).
  • Scripting for automation (Bash, Python).
  • Distributed Systems Knowledge:
  • Understanding microservices architecture and service mesh technologies (Istio, Envoy).
  • Ability to program (structured and OOP) using one or more high-level languages, such as Python, Java, C/C++, Ruby, and JavaScript.
  • Experience with distributed storage technologies.
  • Proactive approach to identifying problems, performance bottlenecks, and areas for improvement.
  • Previous success in technical engineering.
  • Coding experience beyond simple scripts.
  • Bachelor’s degree in computer science or related discipline preferred.
  • 5-7 years of experience preferred.

Nice To Haves

  • Application Development
  • Business Management
  • Computer Programming
  • Customer Solutions
  • Design
  • Group Problem Solving
  • Process Improvements
  • Release Management
  • Scripting
  • Software Solutions
  • Telemetry
  • User Experience (UX) Design

Responsibilities

  • Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding.
  • Partner with development teams to improve services through rigorous testing and release procedures.
  • Participate in system design consulting, platform management, and capacity planning.
  • Create sustainable systems and services through automation and uplifts.
  • Balance feature development speed and reliability with well-defined service-level objectives.

Benefits

  • PNC offers a comprehensive range of benefits to help meet your needs now and in the future.
  • Depending on your eligibility, options for full-time employees include: medical/prescription drug coverage (with a Health Savings Account feature), dental and vision options; employee and spouse/child life insurance; short and long-term disability protection; 401(k) with PNC match, pension and stock purchase plans; dependent care reimbursement account; back-up child/elder care; adoption, surrogacy, and doula reimbursement; educational assistance, including select programs fully paid; a robust wellness program with financial incentives.
  • In addition, PNC generally provides the following paid time off, depending on your eligibility: maternity and/or parental leave; up to 11 paid holidays each year; 9 occasional absence days each year, unless otherwise required by law; between 15 to 25 vacation days each year, depending on career level; and years of service.
  • To learn more about these and other programs, including benefits for full time and part-time employees, visit pncthrive.com.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service