Production Operations Specialist II

Bank of America•Richmond, VA

3d•Onsite

About The Position

This job is responsible for being the first point of contact for requests or service failure incidents and maintaining stability for a portfolio of applications. Key responsibilities include documenting or modifying knowledge, performing investigations, identifying incidents, mitigating impacts and engaging in triages, and working with technology teams to identify and resolve issues. Job expectations include following well defined Standard Operating Procedures (SOPs) and partnering with experts to improve service levels by proposing changes to monitoring, alerting, and configuration.

Requirements

Proficient to work independently on the most complex projects, and often on multiple phases.
Has working knowledge of business or function for which technical support is needed to diagnose or resolve problems.
Often responsible for the completion of a phase of a project.
Provides guidance and checks the work of less experienced associates.
Typically has 3 years' experience in IT production support or equivalent.
Proven team player who can work comfortably in a multicultural environment.
Proven ability to work independently, multitask and effectively work in a complex environment with a global team structure
Excellent verbal and written communication skills; Strong influencer, facilitator, and collaborator.
Must be pro-active, enthusiastic, flexible, results driven with attention to detail.
Knowledge of Splunk, Dynatrace, Sitescope, Tivoli Netcool/WebGUI
Experience with Java Virtual Machine, Windows
Experience in a large IT production support environment
Basic understanding/exposure to ITIL/ITSM.
Ability to work in non-contiguous shifts including the potential for weekend days.

Nice To Haves

ITIL Foundation/Intermediary certification
Experience in a ITIL based role, such as Service Desk, Incident, Problem or Change Management

Responsibilities

Monitors and supports application components and infrastructure critical to the business, such as relevant technologies and dashboards, responds to alerts regarding production incidents, and resolves issues prior to customer service interruption
Fulfills requests from users, operations, auditors, and regulators within service level agreements and drives operational excellence through process improvement and monitoring development efforts related to supported technologies
Onboards monitoring tools and applications in access system(s) of record to research potential production incidents, meet user requirements and service changes, and identify and implement automation opportunities in partnership with architects and engineers
Communicates status updates and technical details, such as infrastructure, application and client impact, and component points of failure to management, and provides reporting on environment and incident status in operational meetings
Performs environment routing and cycling, implements splash pages, and liaises with development teams to design and configure auto provisioning, straight thru revocation (STR), and straight thru processing (STP)
Manages aged revocation monitoring to identify and fix defects in applications and systems of record
Prepares technical documentation and develops procedures for trouble shooting incidents in order to identify production failure scenarios, vulnerabilities, and improvement opportunities requiring escalations
Use of monitoring tools to proactively identify and research potential production incidents
Respond to alerts regarding potential production incidents. Escalate to advanced support as needed for problem resolution.
Perform trending and analysis using monitoring tools and reports in order to proactively identify and address potential issues prior to production impact.
Perform all environment routing, cycling, and implementation of splash pages.
Partner with Change Operations to support all Change implementations and proactively identify potential issues resulting from the changes.
Identify opportunities for additional monitoring and automation and partner with Monitoring Architecture and Engineering to implement.
Develop procedures for trouble shooting and possible resolution of issues.
Execute procedures reliably and escalate appropriately to solve incidents quickly.