Site Reliability Engineer (Mainframe)

FIS Global•Milwaukee, WI

8d•Hybrid

About The Position

In this role, you will play a key role in modernizing critical applications with a focus on improving observability, automation, and resiliency. You’ll work across both Mainframe technologies (COBOL, RPG) and modern server-based environments (Java, Angular, .NET), giving you a unique opportunity to operate at the intersection of legacy systems and contemporary microservices. This is a great opportunity to drive engineering improvements that directly enhance production support operations. This role is with our IBS Core Banking team.

Requirements

Education: Bachelor's degree in computer science, Information Systems, Engineering, or equivalent job experience.
10+ years of hands-on experience in software development.
Mainframe Technologies (Required): COBOL, RPG, JCL, CICS, SQL, CL, DDS, DDL, JES.
Modern Languages & Frameworks: Java, C#, Python, JavaScript, Spring Boot, Hibernate, JDBC, Angular, Oracle PL/SQL.
Automation & IaC: Python/Bash/PowerShell scripting, Terraform, Ansible, Jenkins, GitHub, Bitbucket, ServiceNow, Jira, Azure DevOps.
Monitoring Tools (Preferred): Splunk, Dynatrace, Resolve, Nobl9, JMeter, Zabbix.
Develop automation scripts (Python, Bash, PowerShell)
Preferred - Infrastructure as Code (Terraform, Ansible), and manage containerization (Docker, Kubernetes).
Apply automation and integration scripting across tools, including Jenkins, GitHub, Bitbucket, NUnit, Watir, Checkmarx, ServiceNow, Jira, and Azure DevOps.
Strong proficiency in scripting languages (Python, Bash) and related automation frameworks.
Experience with Jira, ServiceNow, Jenkins, and Docker.
Proficiency in modern application architectures (web, API), cloud platforms (AWS, Azure, Google Cloud), and IaC practices (Terraform, Ansible).

Nice To Haves

Experience supporting large-scale, client-facing enterprise systems.
Strong troubleshooting skills, including participation in incident management and post‑mortem processes.
Solid understanding of Linux and Windows operating systems.
Familiarity with SDLC, Agile/Scrum, and bi‑monthly production release cycles
Exposure to mainframe environments (AS/400, z/OS) or willingness to learn.
Knowledge of FinTech, payments, or banking systems including API design and third-party integration.
Knowledge of FIS products/services and the broader Financial Services Industry.
Problem-Solving: A strong desire to troubleshoot complex technical issues and a hunger to maintain and grow existing skills.
Strong communication and collaboration skills.
Commitment to continuous learning, adaptability, and operational excellence.
Demonstrates judgment, flexibility, and solutions-oriented mindset.
Take ownership of engineering and product outcomes.
Action-oriented self-starter with strong execution skills.
Excellent interpersonal, negotiation, and influencing skills.
Penchant for excellence, intellectual curiosity, and continuous improvement.
Quickly establishes credibility with colleagues and partners.
Embodies and delivers the firm's values: Win as one team, Lead with integrity, Be the change.
Experience with development environments and tools including V7.4, Eclipse, Visual Studio, Azure DevOps, MDCMS, Git, and Microsoft Office tools such as Visio, RDi, X Analysis, Hawkeye, and CheckMarx.
Familiarity with Application Performance Monitoring (APM) and Real User Monitoring (RUM) tools.
Hands-on experience with monitoring solutions such as Splunk, Dynatrace, Resolve, Nobl9, JMeter, and Zabbix for dashboards, alerting, and performance analysis.

Responsibilities

Automation: Identify automation opportunities and implement tools and processes that streamline routine tasks, enable scalable infrastructure, and support seamless deployments.
Reliability: Ensure the reliability, availability, and performance of applications and services. Develop and track new service level indicators to support SLO and SLA compliance.
Monitoring: Design and maintain monitoring and alerting solutions that improve visibility into infrastructure, application performance, and user experience.
Capacity/Performance: Conduct capacity planning, performance tuning, and resource optimization in partnership with development and operations teams.
Documentation: Create and maintain clear documentation and knowledge base articles to promote knowledge sharing.
Disaster Recovery: Recommend and implement improvements to disaster recovery plans, backup strategies, and failover mechanisms.
Incident Response: Lead incident response as a subject matter expert, including identification, triage, resolution, and post-incident analysis.
Identify and drive improvements in reliability, performance, and efficiency through data and root cause analysis.
Participate in an on-call rotation to support critical production incidents. You’ll join a globally distributed team that provides 24/7 coverage, ensuring fast triage, coordinated response, and seamless resolution of ‑high priority‑ issues.
Application Enhancement: Partner with development, QA, DevOps, and product teams to influence design and drive application resiliency improvements.
Continuous Learning: Continue your skill development progress through product training and technical training with Pluralsight across multiple technologies.