Site Reliability Engineer

OrchardPhoenix, AZ
Onsite

About The Position

Site Reliability Engineer (Application Support) Phoenix, Arizona US Citizens & Permanent Residents only (no sponsorship is available). @Orchard LLC is actively recruiting a Sire Reliability Engineer to support a major management consulting engagement, located in the Phoenix, AZ area. This is an on-site role at the client, responsible for ensuring platform stability, scalability, and resilience. The environment spans mainframe, ETL pipelines, databases, and distributed systems. This role will be key to minimizing downtime, improving performance, and enabling reliable delivery of business-critical services.

Requirements

  • Bachelor's degree or higher in Computer Science or equivalent preferred, but not required.
  • 5+ years of experience in Application Support with a strong understanding of monitoring, alerting, and observability.
  • Strong understanding of Site Reliability Engineering principles (SLIs, SLOs, SLAs).
  • 5+ years of experience in Incident management and root cause analysis (RCA).
  • Demonstrated understanding of Mainframe environments (z/OS, JCL, batch processing).
  • The ability to work with mainframe teams and understand batch job dependencies and failures.
  • Working knowledge of relational databases (Oracle, SQL Server, DB2, Teradata, Hive, etc.)
  • Experience with performance monitoring tools like Dynatrace, Splunk/ELK.

Nice To Haves

  • Medallion Architecture
  • Containerization (Kubernetes / Docker)
  • Experience with Marketing applications (Pega, Salesforce, Adobe, Zafin, Naehas)
  • Visual Basic (.NET)
  • SaaS based solutions
  • Financial Services industry knowledge.

Responsibilities

  • Support and maintain mission-critical applications developed in COBOL, DB2, Pega, VB .NET, and Java, including diagnosing and resolving application and database performance issues.
  • Monitor and maintain the health, performance, and reliability of large-scale Hadoop clusters and big data environments, ensuring optimal resource utilization and uptime.
  • Develop, automate, and optimize data pipelines using SQL, Python, and PySpark for efficient data ingestion, transformation, and processing.
  • Troubleshoot and resolve complex issues related to Informatica ETL processes, ensuring data quality, consistency, and timely delivery.
  • Implement and enforce best practices for site reliability, including automated monitoring, alerting, and incident response for both big data platforms and legacy systems.
  • Collaborate with development, QA, and infrastructure teams to support application deployments, upgrades, and integration across diverse technologies.
  • Document operational procedures, incident reports, and system configurations to support knowledge sharing and business continuity.
  • Continuously evaluate and recommend improvements for system scalability, security, and reliability in both big data and legacy application environments.
  • Ensure data security, governance, and compliance standards are met within all data engineering processes.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service