Sr. Incident Commander

Docusign
15hRemote

About The Position

The Senior Incident Commander is part of the SRE Incident Response team at Docusign. The role is around leading and facilitating incidents and incident management processes around our products and security. The role involves strategic project management, effective communication with stakeholders including executive leadership, and handling challenging incidents independently. They play a pivotal role in developing Docusign’s overall service excellence practice by creating standard operating procedures, training material, operationalizing action items and provide valuable metrics for improvement The role also requires daily incident management support across various Docusign infrastructures globally, ensuring the maintenance of service levels. The role will facilitate resolution for all major incidents, and handling communications via bridge calls and emails. The role includes on-call responsibilities outside business hours and weekends, daily reporting, ticket administration, and general production assurance duties. The ideal candidate is self-motivated and responsible, with the ability to prioritize under heavy workloads and operate under time constraints. Adherence to established procedures and detailed documentation of incidents and resolution steps is essential. This position is an individual contributor role reporting to the Sr. Manager, SRE Incident Command.

Requirements

  • 8+ years experience in Incident Management, including leadership of major incidents and high-severity situations
  • Experience in operating and implementing Incident Management tools
  • Experience monitoring platforms and applications like Prometheus, Grafana, Azure Data Explorer, Incident.io
  • Experience with cloud and on-premise system architecture and design
  • Experience with troubleshooting techniques and problem-solving in a 24x7x365 environment

Nice To Haves

  • Experience analyzing incidents from customers perspective and drive through all phases to mitigation
  • Experience leading during incident calls, confidently driving towards resolution while communicating progress effectively to all stakeholders
  • Strong cross-functional collaboration, coordinating with multiple internal teams to establish containment and remediation strategies are implemented and carried out
  • Ability to lead incident calls confidently and independently to a successful resolution
  • Ability to understand and work within complex, large enterprise business environments
  • Process improvement experience, including conducting process analysis, identifying inefficiencies, and implementing recommended solutions
  • Experience managing complex security and privacy investigations
  • Excellent oral and written communication skills, with the ability to tailor messages for technical and non-technical audiences
  • Ability to work well interpersonally across various levels and disciplines, as well as influence and manage without direct authority
  • Skilled in understanding infrastructure dependencies and system integrations to perform troubleshooting in public/private cloud environments
  • Applied mitigation experience with microservices architecture, CI/CD pipelines, network architecture, data storage solutions, and virtualization across hybrid environments, ensuring rapid incident resolution, effective rollback practices, and minimized downtime in highly distributed systems

Responsibilities

  • Serve as a subject matter expert for Docusign’s incident management
  • Partner with the SRE team to manage complex and sensitive critical incidents to conclusion, identifying and resolving challenges to ensure timely resolution
  • Partner with Service Owners and SRE to craft quality RCA and drive improvements across the domain to minimize number of incidents and their severity
  • Monitor, evaluate and report on incident management programs, processes and statistics to assure continuous improvement, implementing automated procedures to capture such data consistently
  • Lead post-incident reviews (RCA) by working with Service Owners and SREs to identify root causes, propose actionable improvements, and implement processes that minimize the number and severity of future incidents
  • Leverage organizational data to analyze incident trends, operational success metrics, and key areas for improvement, enabling data-driven decision-making and proactive prevention strategies
  • Utilize advanced monitoring and automation tools to identify opportunities to reduce response times, and ensure swift mitigation of risks, enabling more efficient management of major incidents and preventing incident recurrence
  • Regularly interact with senior leaders to facilitate effective incident handling or project delivery, producing suitable communications
  • Ability to generate communications for multiple audience types, both customer-facing and internal
  • Prioritize incidents based on impact and urgency and classify them based on customer and operational impact, ensuring efficient resource allocation and effective resolution
  • Engage resources to resolve major incidents and minimize customer/business impact, managing escalation pathos as necessary
  • Serve as an escalation point within the Incident Management process, contributing to and initiating Crisis Incident response processes and applying the escalation process when required
  • Analyze incident data for anomalies, correlations, and trends against operational success criteria to improve incident response and prevention strategies
  • Participate in a rotational shift 24 x 7 x 365

Benefits

  • Bonus: Sales personnel are eligible for variable incentive pay dependent on their achievement of pre-established sales goals. Non-Sales roles are eligible for a company bonus plan, which is calculated as a percentage of eligible wages and dependent on company performance.
  • Stock: This role is eligible to receive Restricted Stock Units (RSUs).
  • Paid Time Off: earned time off, as well as paid company holidays based on region
  • Paid Parental Leave: take up to six months off with your child after birth, adoption or foster care placement
  • Full Health Benefits Plans: options for 100% employer paid and minimum employee contribution health plans from day one of employment
  • Retirement Plans: select retirement and pension programs with potential for employer contributions
  • Learning and Development: options for coaching, online courses and education reimbursements
  • Compassionate Care Leave: paid time off following the loss of a loved one and other life-changing events
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service