About The Position

The Senior Incident & Automation Engineer serves as a critical bridge between the Technology Incident Optimization Program and the core Compute, Virtualization, Cloud Services, and Storage technology domains. This role demands deep technical expertise combined with strategic thinking to drive tactical incident reduction while architecting the future state of intelligent event management and automation. You will be responsible for building automated incident remediation workflows and achieving measurable incident reduction within your domain through event optimization, correlation, and automation while ensuring comprehensive observability is maintained and enhanced. This position offers the unique opportunity to shape the future of enterprise event management.

Requirements

  • Bachelor's degree in Computer Science, Information Technology, Computer Engineering, or a related technical field.
  • A minimum of 8+ years of hands-on experience in IT operations, infrastructure engineering, or system architecture within large-scale enterprise environments.
  • Proven experience and demonstrated success in leading event management and incident reduction initiatives with quantifiable results.
  • Direct, hands-on experience with modern AIOps and event management platforms is required.
  • Deep understanding of enterprise infrastructure including virtualization architectures, container orchestration, microservices, and various storage architectures (block, file, object).
  • Expertise with a broad range of domain-specific monitoring tools for compute, virtualization, storage, and cloud platforms.
  • Hands-on experience developing robust automation solutions using scripting languages and modern automation frameworks.
  • Proficiency in log analysis, pattern recognition, and using query languages for data analysis on log aggregation platforms.
  • Excellent analytical abilities with a systematic approach to troubleshooting complex issues and a holistic view of technology systems.
  • Exceptional communication skills with the ability to influence and collaborate effectively across diverse, cross-functional teams and present technical concepts to various audiences.

Nice To Haves

  • An advanced degree (Master's) in a relevant technical field.
  • Relevant industry certifications (e.g., Cloud, Virtualization, Automation, ITIL).
  • Experience with AIOps, machine learning for IT operations, and Site Reliability Engineering (SRE) practices.
  • Knowledge of ITSM platforms, CMDB management, and infrastructure-as-code (IaC) principles.
  • Familiarity with financial services regulatory requirements.

Responsibilities

  • Conduct comprehensive analysis of alert and incident patterns to identify top sources of operational noise, determine root causes, and develop data-driven strategies for reduction.
  • Design, implement, and optimize rules for event correlation, de-duplication, and suppression on AIOps and event management platforms. Develop domain-specific correlation logic leveraging configuration management data and infrastructure topology.
  • Architect and develop automation playbooks for incident data enrichment and create self-healing capabilities for common and recurring infrastructure incident scenarios.
  • Assess the current observability footprint across all infrastructure domains to identify gaps and propose enhancements that align with enterprise event management standards.
  • Partner closely with infrastructure operations, engineering, and platform teams to understand incident drivers, validate correlation logic, and provide expert guidance on event management best practices.
  • Continuously validate the effectiveness of implemented rules and automation to ensure no business-impacting alerts are missed. Monitor and report on alert quality metrics and lead iterative improvements.

Benefits

  • medical, dental & vision coverage
  • 401(k)
  • life, accident, and disability insurance
  • wellness programs
  • paid time off packages, including planned time off (vacation), unplanned time off (sick leave), and paid holidays
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service