Senior Reliability Engineer

U.S. BankHorsham, PA
$111,605 - $131,300

About The Position

The Senior Reliability Engineer is responsible for the stability, performance, and scalability of the PCS Data Warehouse platform. This role combines production support, engineering, and automation to reduce incidents, improve reliability, and increase delivery speed through code standardization, optimization, and AI-assisted development. The position partners with ETL developers, production support, and data owners to identify risk, optimize system performance, and enforce consistent engineering practices. This role leads the integration of AI-assisted development and automation across the platform. The Senior Reliability Engineer supports the full lifecycle of the PCS Data Warehouse platform, including ETL processing, data pipelines, infrastructure, and operational tooling. The role focuses on preventing production issues, improving observability, and optimizing performance across code, data flows, and platform operations. This position operates across engineering and production support to ensure data availability, performance, and reliability for business-critical processes.

Requirements

  • Bachelor’s degree or equivalent experience
  • 5+ years in reliability engineering, production support, or data platform engineering
  • Strong experience with data warehouse environments, ETL, and SQL
  • Proven experience in performance optimization and system tuning
  • Experience supporting production systems and resolving incidents
  • Strong problem-solving and root cause analysis skills
  • Experience with monitoring and observability tools

Nice To Haves

  • Experience with Informatica PowerCenter, Netezza, Oracle, or similar platforms
  • Experience with Python, shell, or automation frameworks
  • Familiarity with CI/CD and release processes
  • Experience with AI-assisted development tools or prompt engineering
  • Financial services or large-scale data environment experience

Responsibilities

  • Ensure high availability, performance, and reliability of the PCS Data Warehouse
  • Proactively identify system risk and remediate before impact
  • Lead root cause analysis and implement permanent fixes
  • Reduce repeat incidents through automation and design improvements
  • Monitor data pipelines, batch processing, and platform health
  • Optimize ETL jobs, SQL queries, and data pipelines for performance and efficiency
  • Identify bottlenecks across data processing, scheduling, and infrastructure
  • Drive tuning of database workloads and batch execution windows
  • Improve throughput, reduce latency, and lower resource consumption
  • Standardize performance best practices across development teams
  • Own major incident response and coordination
  • Drive post-incident reviews and corrective actions
  • Improve MTTD and MTTR
  • Align with service management processes for incident, problem, and change tracking
  • Define and enforce coding standards for ETL, SQL, and data workflows
  • Perform and guide peer code reviews
  • Ensure consistent logging, error handling, and monitoring in all code
  • Partner with developers to improve reliability and maintainability
  • Lead adoption of AI-assisted development for ETL, SQL, and scripting
  • Implement reusable prompt standards and instruction libraries
  • Ensure AI-generated code meets governance, testing, and security standards
  • Automate repeatable development and operational tasks
  • Drive continuous improvement using AI-assisted recommendations
  • Design and implement monitoring, alerting, and telemetry
  • Build dashboards for performance and failure visibility
  • Improve proactive detection of issues
  • Standardize logging and metrics across the platform
  • Improve architecture scalability and resiliency
  • Automate deployments, validation, and recovery processes
  • Support release readiness and production implementation
  • Drive continuous platform optimization and modernization
  • Partner with ETL developers, data owners, and business teams
  • Work with production support to stabilize operations
  • Provide technical leadership and mentoring
  • Influence reliability and engineering standards across the platform

Benefits

  • Healthcare (medical, dental, vision)
  • Basic term and optional term life insurance
  • Short-term and long-term disability
  • Pregnancy disability and parental leave
  • 401(k) and employer-funded retirement plan
  • Paid vacation (from two to five weeks depending on salary grade and tenure)
  • Up to 11 paid holiday opportunities
  • Adoption assistance
  • Sick and Safe Leave accruals of one hour for every 30 worked, up to 80 hours per calendar year unless otherwise provided by law
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service