Lead Monitoring Engineer

SAICWashington, DC
10hOnsite

About The Position

This role is responsible for ensuring the effective discovery, monitoring, and management of IT infrastructure — including servers, cloud services, networks, applications, and storage systems — through OpenText, OBM, SiteScope and other tools. Core monitoring duties include enterprise event consolidation, topology-based health analysis, performance metric tracking, event correlation configuration, and ongoing optimization of OBM to improve operational visibility and responsiveness. Job Description and Duties Position is required to be full-time onsite at DOT HQ, Washington DC Assist in driving, standardizing, and managing unified configuration management database. Deploy, manage, and update Management Packs, connectors, and monitoring policies to support business application and service monitoring needs. Implements Management Packs, custom dashboards, third-party connectors, and monitoring automation to proactively detect, troubleshoot, and resolve service-impacting issues. Perform event correlation and filtering to streamline incident triage, reduce noise, and ensure timely escalation to appropriate operational teams. Integrate data sources from third-party monitoring tools (OpenText OBM, SiteScope, Microsoft SCOM) into the unified OBM event console. Assess and fine tune monitoring capabilities to provide accurate and actionable alerts to the 24x7 operations systems. Provides co-witness and correlation support during assessment and outage bridges to assist in resolving service disruptions and restoring services. Creates alerts and notifications based on service availability. Applies new solutions through research and collaboration with team members and determines appropriate courses of action for monitoring enhancements and integrations. Create and provide intuitive and informative dashboards on current and past performance and service status. Configure, maintain, and optimize monitoring dashboards to monitor health and performance across diverse IT infrastructure components.

Requirements

  • Must have extensive knowledge of multi-vendor server operating systems.
  • Minimum of 10 years of experience performing monitoring and managing/configuring monitoring systems
  • Expert level systems administrator experience managing Windows and/or Linux operating systems
  • Direct experience and expertise with Management Protocols including SNMP, and WMI
  • Scripting Experience: PowerShell, VBScript, and/or other scripting experience
  • Experience managing monitoring systems with >250 Host and/or >3000 sensors
  • Proven track record of engineering monitoring solutions, providing strategic direction, and fostering a collaborative and innovative work environment.
  • Candidate must be a U.S. citizen or green card holder who has resided in the U.S. for at least 3 years and the ability to obtain a public trust.

Nice To Haves

  • Experience supporting a 24x7 operations environment, ideally support large Federal/Defense Infrastructures.
  • OpenText suite of tools including AI Operations Management, Operations Bridge, SiteScope, and Optic experience.
  • Experience leading troubleshooting coordination/ acting as a Tech Lead during service outages requiring collaboration across multiple teams and infrastructure components
  • Expert level experience with scripting and automation
  • Experience integrating monitoring tools to operate through ServiceNow
  • Experience automating alerts to generate Service Tickets
  • Strong understanding of ITIL and ITSM including monitoring, demand management, availability management, and capacity management
  • ITIL certification(s) including Foundations and above strongly preferred
  • Experience analyzing monitoring and associated reports to drive business decisions for capacity and availability experience
  • Experience creating senior level brief work products including functional and data driven dashboards from captured performance data and availability metrics.
  • Experience with visualization and computational tools

Responsibilities

  • Assist in driving, standardizing, and managing unified configuration management database.
  • Deploy, manage, and update Management Packs, connectors, and monitoring policies to support business application and service monitoring needs.
  • Implements Management Packs, custom dashboards, third-party connectors, and monitoring automation to proactively detect, troubleshoot, and resolve service-impacting issues.
  • Perform event correlation and filtering to streamline incident triage, reduce noise, and ensure timely escalation to appropriate operational teams.
  • Integrate data sources from third-party monitoring tools (OpenText OBM, SiteScope, Microsoft SCOM) into the unified OBM event console.
  • Assess and fine tune monitoring capabilities to provide accurate and actionable alerts to the 24x7 operations systems.
  • Provides co-witness and correlation support during assessment and outage bridges to assist in resolving service disruptions and restoring services.
  • Creates alerts and notifications based on service availability.
  • Applies new solutions through research and collaboration with team members and determines appropriate courses of action for monitoring enhancements and integrations.
  • Create and provide intuitive and informative dashboards on current and past performance and service status.
  • Configure, maintain, and optimize monitoring dashboards to monitor health and performance across diverse IT infrastructure components.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service