Site Reliability Engineering Manager (Operations Manager)

FIS

47d•Hybrid

About The Position

We are FIS. Our technology powers the world’s economy and our teams bring innovation to life. We champion diversity to deliver the best products and solutions for our colleagues, clients and communities. If you’re ready to start learning, growing and making an impact with a career in fintech, we’d like to know: Are you FIS? NOTE: This position is hybrid (3 days onsite) at our FIS Office locations in Jacksonville (FL) & Milwaukee (WI). About the Role and Team: We are seeking a highly qualified Site Reliability Engineering Manager to lead operations within our fintech data center. The ideal candidate will bring deep expertise in SaaS platform reliability, server-side application management, Site Reliability Engineering principles, and demonstrate outstanding leadership abilities. Must have a proven track record in incident resolution, compliance-driven service delivery, and managing complex infrastructure and cross-functional teams in the fintech sector. This individual will play a key role in driving the modernizing of critical applications with a focus on improving observability, automation, and resiliency. This is an opportunity to lead a team that will work across both mainframe technologies (COBOL, RPG) and modern server-based environments (Java, Angular, .NET), giving you a unique opportunity to operate at the intersection of legacy systems and contemporary microservices. This is a great opportunity to drive engineering improvements that directly enhance production support operations. This individual will be responsible for ensuring environment reliability, toil automation, and resiliency improvements through effective oversight. Key responsibilities include strategic planning, team leadership, and fostering collaboration between technical and business units to ensure that operational initiatives align with organizational objectives. This role requires regular customer interactions regarding application performance, stability and reliability, including optimization and improved delivery of applications and services. This role is with our IBS Core Banking team.

Requirements

Extensive experience managing mission critical platforms, applications services, including at least 5 years in a leadership capacity.
7-10+ years of management experience in software development life cycle.
Deep technical knowledge or familiarity with the technical concepts related to the below technologies
Mainframe Technologies: (Required): COBOL, RPG, (Preferred): JCL, CICS, SQL, CL, DDS, DDL, JES, and mainframe environments (AS/400, z/OS) or willingness to learn
Modern Languages & Frameworks (Required): Java, C#, Python, JavaScript, Spring Boot, Hibernate, JDBC, Angular, Oracle PL/SQL.
Automation & IaC (Required): Python/Bash/PowerShell scripting, Terraform, Ansible, Jenkins, GitHub, Bitbucket, ServiceNow, Jira, Azure DevOps.
Monitoring Tools (Preferred): Splunk, Dynatrace, Resolve, Nobl9, JMeter, Zabbix.
Experience working with Windows, Linux and IBMi operating systems, and administration of applications within these operating systems.
Comprehensive knowledge of data center infrastructure components, such as servers, networking, storage, and virtualization technologies.
Proficient in cybersecurity practices and data protection protocols relevant to data center environments.
Leadership Skills: Demonstrated ability to lead and motivate teams, coupled with strong communication and interpersonal capabilities.
Problem-Solving: Exceptional analytical skills and a commitment to continuous improvement.
Familiarity with SDLC, CI/CD, as well as DevOps and Site Reliability methodologies.
Resourceful and proactive in gathering information, resolving challenges, and promoting innovative solutions.
Excellent strategic thinking and innovation, supported by advanced problem-solving and analytical abilities.
Effective incident and problem management, including oversight and implementation of permanent solutions.
Outstanding communication skills and the ability to collaborate effectively with both technical and business stakeholders.
Well-versed in industry regulations and compliance standards pertinent to data center operations.
Bachelor’s degree in Computer Science, Information Technology, or a related discipline is required; a Master’s degree is preferred.

Nice To Haves

Past or current experience with data center operations and mission critical platforms.
Knowledge of building and maintaining FinTech, payment, or banking systems
Strong understanding of data center
Understanding of industry regulations and compliance standards applicable to data centers and the fintech industry (e.g., PCI DSS, GDPR).
Extensive experience in data center operations management, preferably within the financial technology sector.
Proven track record in managing complex IT infrastructures and leading high-performing teams.
Knowledge of working in an Agile environment where production code is delivered bi-monthly.
Knowledge or FIS’ products and services.
Knowledge of the Financial Services Industry.

Responsibilities

Operational Management: Oversee SaaS platform server operations to maintain a high level of availability and stability for multiple applications including incident/problem analysis, change deployment, application performance, reliability, monitoring, and security.
Site Reliability Engineering Management: Evaluate and prioritize automation opportunities and lead the team to implement tools and processes that streamline routine tasks, enable scalable infrastructure, and support seamless deployments.
Service Reliability and Availability Management: Lead improvement of the reliability and availability of critical applications, platforms, and server infrastructure through proactive monitoring, incident management, and resiliency improvements. Guide the team to develop and track new service level indicators to support SLO and SLA compliance.
Team Leadership: Direct and mentor the operations team, promoting a culture committed to excellence and ongoing improvement.
Monitoring: Evaluate and interpret monitoring and alerting solutions that improve visibility into infrastructure, application performance, and user experience. Proactively identifying improvement opportunities and implementing effective corrective actions.
Strategic Planning: Formulate and execute strategic initiatives to enhance efficiency, including capacity planning, disaster recovery, and business continuity measures.
Disaster Recovery: Recommend and implement improvements to disaster recovery plans, backup strategies, and failover mechanisms.
Compliance: Ensure ongoing compliance with industry regulations, standards, and best practices, particularly in data security and privacy.
Innovation: Maintain up-to-date knowledge of emerging technologies and trends in Site Reliability Engineering, SaaS platform server management and fintech to drive continuous innovation within the team.
Infrastructure Management: Supervise maintenance, configuration, and reliability of all data center infrastructure, including servers, networks, and storage systems. Delivers a production server operations environment that meets all service level agreements, processing service level objectives, response time targets, and availability targets.
Security and Compliance: Oversee data security protocols and maintain adherence to regulatory and industry standards.
Incident Management: Lead incident management processes, ensuring rapid resolution and clear communication with stakeholders. Identify and drive improvements in reliability, performance, and efficiency through data and root cause analysis. Participate in an oncall rotation to support critical production incidents. You’ll join a globally distributed team that provides 24/7 coverage, ensuring fast triage, coordinated response, and seamless resolution of ‑high priority‑ issues.
Capacity Planning: Strategically manage capacity to support future growth, ensuring the data center adapts to increasing demands without compromising security or performance.
Collaboration: Partner with cross-functional teams to align data center operations with overall organizational objectives.
Application Enhancement: Partner with development, QA, DevOps, and product teams to influence design and drive application resiliency improvements.
Risk Management: Proactively identify operational risks and develop strategies to mitigate disruptions or data breaches.
Operational Excellence & Client Engagement: Conduct regular service level reviews to evaluate platform and application performance, and manage a structured feedback loop to identify, track, and resolve recurring technology and application issues. Use this feedback to drive continuous improvement initiatives, prioritize remediation efforts, and inform release planning. Ensure that findings are documented, action items are tracked, and outcomes are communicated to leadership and product owners.

Benefits

A career at FIS is more than just a job. It’s the change to shape the future of fintech.
A voice in the future of fintech
Always-on learning and development
Collaborative work environment
Opportunities to give back
Competitive salary and benefits

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume