Director, Software Engineering

DocusignSeattle, WA
6dHybrid

About The Position

Docusign brings agreements to life. Over 1.5 million customers and more than a billion people in over 180 countries use Docusign solutions to accelerate the process of doing business and simplify people’s lives. With intelligent agreement management, Docusign unleashes business-critical data that is trapped inside of documents. Until now, these were disconnected from business systems of record, costing businesses time, money, and opportunity. Using Docusign’s Intelligent Agreement Management platform, companies can create, commit, and manage agreements with solutions created by the #1 company in e-signature and contract lifecycle management (CLM). What you'll do Docusign operates global, always-on services. To safeguard customer trust and empower our engineering teams, we are building the next generation of monitoring and troubleshooting capabilities.] You will lead the engineering organization behind our AI-powered observability platform. Built to complement existing tools like Grafana, Prometheus, and Clickhouse/Azure Data Explorer , this platform addresses the challenges of information overload by making observability accessible to everyone. You will manage a multidisciplinary team of backend engineers, machine learning engineers, and data scientists to build predictive, automated insights while running a highly critical service with strict SLAs. This positon is a apeople manager role reporting to the Senior Director, Software Engineering.

Requirements

  • 12+ years in software/infra/platform engineering, including 6+ years operating high-scale, 24x7 back-end systems.
  • 3+ years managing managers and senior ICs, specifically leading multidisciplinary teams across domains like backend engineering and machine learning.
  • Proven ownership of mission-critical services with strict SLOs, incident management, and change control.
  • Experience hiring, developing, and retaining senior/principal talent and building inclusive, high-trust teams.

Nice To Haves

  • Familiarity with modern observability practices (metrics, logs, distributed tracing) and standards such as OpenTelemetry
  • Experience leading teams that build, train, and deploy machine learning models in production environments for AIOps, anomaly detection, or predictive analytics
  • Experience improving the performance and reliability of large telemetry/data pipelines at a multi-region scale
  • Exposure to multi-cloud environments and Kubernetes-based platforms

Responsibilities

  • Manage high-throughput, real-time observability pipelines processing massive volumes of telemetry data across multi-region, multi-cloud environments
  • Operate a Tier-0 data plane with strict SLOs, disciplined change management, and high availability requirements
  • Serve as the observability backbone for every engineering team's reliability and velocity, dramatically reducing the time from "something's wrong" to "here's the problem"
  • Set a clear 12-24 month vision for AIOps capabilities, focusing on automated troubleshooting workflows, proactive anomaly detection, and smart alert aggregation
  • Bridge the gap between robust backend infrastructure and applied machine learning, ensuring models for auto-threshold estimation and pattern recognition are effectively trained and reliably deployed at scale
  • Drive availability, durability, incident readiness, and disaster recovery for the observability plane; run regular resilience drills
  • Lead, hire, and grow a senior-heavy team of backend engineers, ML engineers, and applied scientists. Build an architecture culture, clear career paths, and a high-judgment, high-ownership operating model
  • Own the annual operating plan for the AIOps platform capacity, availability, and budget, meeting or beating committed SLOs/SLAs
  • Drive the development of intelligent capabilities like automated impact analysis, ML-driven threshold estimation, and natural language interfaces to reduce alert noise and accelerate debugging
  • Continually improve ingestion latency, query performance, storage efficiency, and cost per unit while maintaining reliability through traffic spikes and deploys
  • Partner with SRE, Telemetry Platform, Security, Finance, and Product; make pragmatic build-vs-buy decisions; manage vendors and capacity commitments
  • Lead on-call and incident command for the observability platform

Benefits

  • Bonus: Sales personnel are eligible for variable incentive pay dependent on their achievement of pre-established sales goals. Non-Sales roles are eligible for a company bonus plan, which is calculated as a percentage of eligible wages and dependent on company performance.
  • Stock: This role is eligible to receive Restricted Stock Units (RSUs).
  • Global benefits provide options for the following:
  • Paid Time Off: earned time off, as well as paid company holidays based on region
  • Paid Parental Leave: take up to six months off with your child after birth, adoption or foster care placement
  • Full Health Benefits Plans: options for 100% employer paid and minimum employee contribution health plans from day one of employment
  • Retirement Plans: select retirement and pension programs with potential for employer contributions
  • Learning and Development: options for coaching, online courses and education reimbursements
  • Compassionate Care Leave: paid time off following the loss of a loved one and other life-changing events
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service