Director of Incident Response

NorthMark StrategiesDallas, TX

About The Position

The Director of Incident Response owns the full incident lifecycle across NMC²'s HPC and multi-tenant cloud environments, reporting to the CISO. This is a builder role. You will stand up the IR function from the ground up: playbooks, on-call rotations, tooling integration, forensic capability, and the team itself. You will run major incidents in environments where detection-to-containment is measured in minutes, where forensic preservation must survive tenant-specific legal hold requirements, and where every post-incident finding feeds directly into engineering backlogs with enforceable SLAs. You will operate as the senior IR authority across Security Engineering, Platform Engineering, Data Center Operations, and customer-facing technical teams.

Requirements

  • 10+ years in security operations or incident response, with at least 5 years running major incident response in high-availability, multi-tenant, or mission-critical infrastructure environments
  • 5+ years leading IR or SOC teams, including direct accountability for hiring, performance management, and 24/7 operational coverage
  • Demonstrated incident command experience on Sev-0 events with customer, regulatory, or board-level exposure
  • Deep technical fluency in at least two of: HPC environments ( Slurm , InfiniBand, GPU clusters), Kubernetes and container security, hypervisor and bare-metal forensics, or public cloud incident response (AWS, Azure, GCP)
  • Working command of NIST SP 800-61r2, MITRE ATT&CK, and CIS Controls v8 Incident Response domain (Controls 17.1 through 17.9)
  • Hands-on experience with Jira Service Management, PagerDuty or equivalent, and at least one enterprise SIEM or XDR platform in a production IR context
  • Experience building detection-to-response feedback loops with a detection engineering or SOC counterpart, not operating IR as a downstream consumer of alerts
  • Track record of RCA work that produced engineering remediation with measurable defect reduction, not documentation for its own sake
  • Comfort operating in a pre-scale organization where tooling, process, and team do not yet exist and must be designed before they can be run
  • Must be legally authorized to work in the United States without the need for employer sponsorship, now or at any time in the future.

Nice To Haves

  • GCIH, GCFA, GCFR, or equivalent hands-on IR certification
  • ITIL 4 Foundation or Practitioner certification
  • Experience with regulated or contractually constrained environments: financial services customers, export-controlled workloads, or sovereign cloud requirements
  • Prior experience during a CSP independence or infrastructure repatriation program

Responsibilities

  • Build the IR function end to end: staffing model, 24/7 coverage plan, severity matrix, escalation tree, retainer relationships, and tooling stack aligned to NIST SP 800-61r2 phase structure
  • Own major incident command for all Sev-0 and Sev-1 events, security and operational, including customer-facing communications and regulatory notification decisions
  • Develop detection-to-containment runbooks mapped to MITRE ATT&CK techniques relevant to HPC and cloud tenancy threats: credential abuse (T1078), lateral movement via Kubernetes and scheduler primitives (T1610, T1613), data exfiltration over research network egress (T1041, T1567), and supply chain compromise in scientific software pipelines (T1195)
  • Establish forensic readiness across bare-metal HPC nodes, Kubernetes workloads, and hypervisor layers: memory capture, disk imaging, container runtime evidence, and audit log chain-of-custody standards
  • Drive root cause analysis to engineering remediation with measurable closeout SLAs, not written reports that sit on a Confluence page
  • Build and maintain the Known Error Database, runbook library, and tabletop exercise program with scheduled red team, customer-triggered, and infrastructure failure scenarios
  • Instrument the IR function with hard metrics: MTTD, MTTA, MTTC, MTTR by severity and incident class, recurrence rate, playbook coverage percentage, and on-call load distribution
  • Operate Jira Service Management as the authoritative incident system of record, with defined integrations to detection tooling, paging (PagerDuty or equivalent), and engineering backlog systems
  • Partner with Security Engineering on detection engineering feedback loops: every incident either validates an existing detection, triggers a new one, or exposes a detection gap that becomes a tracked engineering item
  • Own executive and board-level incident reporting, including quarterly trend analysis, regulatory and contractual incident disclosures, and customer trust reporting for enterprise accounts
  • Co-own business continuity and disaster recovery testing with Platform and DC Operations, ensuring IR plans integrate cleanly with BCP/DR runbooks

Benefits

  • Company-Paid Lunch Stipend: Lunch is provided via GrubHub
  • Company-Paid Benefits: 100% Employer-Paid Medical in our High Deductible Health Plan, Dental and Vision benefits for employees and their families, 16 weeks of Paid Parental Leave, Employee Assistance Program, Life insurance, Short-Term Disability and Long-Term Disability
  • 401(k): Company will match 100% of your contributions up to 6%
  • Optional Employee-Paid Benefits: Medical insurance in our PPO plan and a variety of other benefits such as Health Savings Accounts (with Company Contribution!), Flexible Spending Accounts, Supplemental Life Insurance, Wellhub and more.
  • Time Off: 25 days of Paid Time Off plus 12 company holidays
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service