Associate Principal, Infrastructure Engineering

The OCCChicago, IL
2dHybrid

About The Position

The Associate Principal of Infrastructure Engineering is responsible for OCC's infrastructure engineering while driving continuous improvement across the production environment. This role combines hands-on technical expertise with strategic operational excellence, managing production administration of application scheduling and automation using Automic/UC4 software across AWS and on-premises platforms (Windows, MVS, Linux). The position serves as a technical evangelist for the best operational practices, working cross-functionally with Production Support, Release Management, Platform Services, Development, Testing, and Business Operations teams to shift the organization from reactive to proactive operations.

Requirements

  • Strongly customer service oriented with understanding of service level agreement urgency
  • Excellent consultative, communication, analytical, and judgment skills
  • Strong problem-solving and decision-making abilities with capacity to perform well under pressure
  • Highly detail-oriented with strong time management skills
  • Ability to prioritize and align efforts with production availability requirements and business priorities
  • Ability to work effectively and interact with clients, team members, technical staff, consultants, and vendors
  • Adaptability to shifting priorities and embrace of change
  • Natural curiosity with strong desire to share knowledge, cross-train, and mentor others
  • Pragmatic approach to problem-solving that optimizes available resources
  • Demonstrates sound analytical and diagnostic skills for complex, ill-defined issues
  • Experience working in the financial services or clearing industry
  • Demonstrated ability to multi-task in a high-intensity environment with constantly changing priorities
  • Advanced proficiency with Automic/UC4 Scheduler and Automic scripting language
  • Strong knowledge of OS/390 JCL and IBM utilities
  • Proficiency with modern scripting languages (Python, PowerShell, Bash) and REGEX for pattern matching and validation
  • Experience with REST API integration and development
  • Proficiency with transmission protocols: Connect Direct, Sterling Integrator, FTP
  • Experience with ServiceNow for incident, problem, and change management
  • Working knowledge of ITIL principles and processes
  • Proficiency with Unix/Linux commands and tools (Putty, WinSCP)
  • Experience across multiple operating systems (Windows, OS/390, Linux) in both cloud and on-premises environments
  • Strong understanding of technical monitoring, metrics, and alerting systems
  • Knowledge of application lifecycle and deployment processes
  • Strong PC skills and proficiency with Microsoft Office Suite including Visio
  • Sophisticated understanding of enterprise technology architecture
  • Mature understanding of the relationship between software and hardware
  • Bachelor's degree in Computer Science, Engineering, Telecommunications, or related technical discipline (or equivalent combination of education and experience)
  • 5+ years of experience in infrastructure engineering and production systems operations

Nice To Haves

  • AWS cloud infrastructure experience and/or AWS certifications
  • Problem/Incident management certifications
  • Background in disaster recovery planning and execution
  • Experience leading continuous improvement initiatives
  • Experience with hybrid cloud/on-premises infrastructure management
  • Familiarity with legacy scripting languages (Perl, REXX)

Responsibilities

  • Infrastructure Engineering & Production Administration
  • Development Support: Code and maintain uc4 objects for development and testing teams; provide first-level support for non-production environments
  • Production Support: Provide second-level troubleshooting for scheduling issues in production environments
  • Automic Administration: Perform basic Automic/UC4 system administration including starting/stopping the Automic system and agents
  • Security Management: Maintain accurate inventory of all departmental owned logon IDs and passwords in CyberArk
  • Disaster Recovery: Provide support for all disaster recovery tests and exercises
  • Enterprise Scheduling & Automation
  • Complex Scheduling: Develop, maintain, and optimize job schedules using Automic/UC4 software across multiple AWS cloud and on-premises platforms (Windows, MVS, Linux), managing multi-job dependencies and automated workflows
  • Scripting & Automation: Utilize Automic scripting language, Python, PowerShell, and other scripting tools to automate variable passing, job execution, and test environment setup; use REGEX for pattern matching and validation
  • File Transmission: Coordinate and test new file transmissions with exchanges and members using Connect Direct, Sterling Integrator, and FTP protocols
  • Batch Processing: Prepare and maintain batch jobs and REST API calls
  • Operational Excellence & Continuous Improvement
  • Proactive Operations: Lead the shift from reactive to proactive operational posture by identifying and addressing issues before they impact production
  • Performance Analysis: Analyze and report on production performance, capacity planning, and critical-path processing opportunities
  • Metrics & KPIs: Develop, monitor, and report Key Performance Indicators to maintain compliance and drive measurable improvements through trend analysis
  • Problem Resolution: Identify and diagnose complex problems affecting production performance; stand up SWAT teams when needed to drive resolution of ongoing issues
  • Cross-Domain Collaboration: Work across network, database, storage, and application teams to assist with tuning and optimization
  • Risk Management: Surface environmental and operational risks; analyze repeating alerts to proactively identify issues
  • ITIL Leadership: Actively participate in and shepherd Incident, Problem, and Change Management processes using ServiceNow; ensure adherence to ITIL best practices
  • Process Automation: Evangelize for and implement repeatable, scalable automated processes
  • Capacity Planning: Forecast system demands and recommend upgrades, expansions, and reconfigurations
  • Documentation, Communication & Stakeholder Management
  • Documentation: Maintain updated procedures on all supported products; create comprehensive process documentation
  • Status Reporting: Provide daily status reports to management; attend project and status meetings as required
  • Knowledge Sharing: Cross-train team members and stakeholders; deliver training on new product releases and best practices
  • Vendor Coordination: Manage vendor support relationships and drive issue resolution
  • Consultation: Serve as a consultant and evangelist for operational best practices across the organization
  • Additional Responsibilities
  • On-Call Support: Provide on-call and/or on-site support for installs, production issues, and system availability
  • Off-Hours Work: Participate in after-hours and weekend maintenance windows as required
  • Troubleshooting: Perform complex hardware and software troubleshooting, taking corrective actions or coordinating with IT staff and vendors

Benefits

  • A highly collaborative and supportive environment developed to encourage work-life balance and employee wellness.
  • A hybrid work environment, up to 2 days per week of remote work
  • Tuition Reimbursement to support your continued education
  • Student Loan Repayment Assistance
  • Technology Stipend allowing you to use the device of your choice to connect to our network while working remotely
  • Generous PTO and Parental leave
  • 401k Employer Match
  • Competitive health benefits including medical, dental and vision
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service