Director, Software Engineering

Docusign•Seattle, WA

58d•Hybrid

About The Position

Docusign brings agreements to life. Over 1.5 million customers and more than a billion people in over 180 countries use Docusign solutions to accelerate the process of doing business and simplify people’s lives. With intelligent agreement management, Docusign unleashes business-critical data that is trapped inside of documents. Until now, these were disconnected from business systems of record, costing businesses time, money, and opportunity. Using Docusign’s Intelligent Agreement Management platform, companies can create, commit, and manage agreements with solutions created by the #1 company in e-signature and contract lifecycle management (CLM). What you'll do Docusign operates global, always-on services. To safeguard customer trust and empower our engineering teams, we are building the next generation of monitoring and troubleshooting capabilities.] You will lead the engineering organization behind our AI-powered observability platform. Built to complement existing tools like Grafana, Prometheus, and Clickhouse/Azure Data Explorer , this platform addresses the challenges of information overload by making observability accessible to everyone. You will manage a multidisciplinary team of backend engineers, machine learning engineers, and data scientists to build predictive, automated insights while running a highly critical service with strict SLAs. This positon is a apeople manager role reporting to the Senior Director, Software Engineering.

Requirements

12+ years in software/infra/platform engineering, including 6+ years operating high-scale, 24x7 back-end systems.
3+ years managing managers and senior ICs, specifically leading multidisciplinary teams across domains like backend engineering and machine learning.
Proven ownership of mission-critical services with strict SLOs, incident management, and change control.
Experience hiring, developing, and retaining senior/principal talent and building inclusive, high-trust teams.

Nice To Haves

Familiarity with modern observability practices (metrics, logs, distributed tracing) and standards such as OpenTelemetry
Experience leading teams that build, train, and deploy machine learning models in production environments for AIOps, anomaly detection, or predictive analytics
Experience improving the performance and reliability of large telemetry/data pipelines at a multi-region scale
Exposure to multi-cloud environments and Kubernetes-based platforms

Responsibilities

Manage high-throughput, real-time observability pipelines processing massive volumes of telemetry data across multi-region, multi-cloud environments
Operate a Tier-0 data plane with strict SLOs, disciplined change management, and high availability requirements
Serve as the observability backbone for every engineering team's reliability and velocity, dramatically reducing the time from "something's wrong" to "here's the problem"
Set a clear 12-24 month vision for AIOps capabilities, focusing on automated troubleshooting workflows, proactive anomaly detection, and smart alert aggregation
Bridge the gap between robust backend infrastructure and applied machine learning, ensuring models for auto-threshold estimation and pattern recognition are effectively trained and reliably deployed at scale
Drive availability, durability, incident readiness, and disaster recovery for the observability plane; run regular resilience drills
Lead, hire, and grow a senior-heavy team of backend engineers, ML engineers, and applied scientists. Build an architecture culture, clear career paths, and a high-judgment, high-ownership operating model
Own the annual operating plan for the AIOps platform capacity, availability, and budget, meeting or beating committed SLOs/SLAs
Drive the development of intelligent capabilities like automated impact analysis, ML-driven threshold estimation, and natural language interfaces to reduce alert noise and accelerate debugging
Continually improve ingestion latency, query performance, storage efficiency, and cost per unit while maintaining reliability through traffic spikes and deploys
Partner with SRE, Telemetry Platform, Security, Finance, and Product; make pragmatic build-vs-buy decisions; manage vendors and capacity commitments
Lead on-call and incident command for the observability platform

Benefits

Bonus: Sales personnel are eligible for variable incentive pay dependent on their achievement of pre-established sales goals. Non-Sales roles are eligible for a company bonus plan, which is calculated as a percentage of eligible wages and dependent on company performance.
Stock: This role is eligible to receive Restricted Stock Units (RSUs).
Global benefits provide options for the following:
Paid Time Off: earned time off, as well as paid company holidays based on region
Paid Parental Leave: take up to six months off with your child after birth, adoption or foster care placement
Full Health Benefits Plans: options for 100% employer paid and minimum employee contribution health plans from day one of employment
Retirement Plans: select retirement and pension programs with potential for employer contributions
Learning and Development: options for coaching, online courses and education reimbursements
Compassionate Care Leave: paid time off following the loss of a loved one and other life-changing events

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume