Director, Engineering

DTEX SystemsFremont, CA
Hybrid

About The Position

DTEX Systems is looking for a Director, Site Reliability Engineering (SRE) to lead operations within the SRE function. This role is responsible for setting priorities aligned to departmental strategies and business goals, executing against key performance metrics, and guiding the professional growth of leaders and individual contributors. The Director will ensure reliability, scalability, and robustness of DTEX’s technical infrastructure while thoughtfully adopting modern automation and AI-enabled capabilities to improve operational outcomes.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
  • 10+ years of experience in site reliability engineering, infrastructure engineering, or a related discipline.
  • Proven experience leading and managing high-performing technical teams.
  • Strong knowledge of system architecture, infrastructure design, and cloud-based systems.
  • Experience with on-premise and public cloud environments (Azure, AWS, VMware, Hyper-V, Cisco).
  • Proficiency in Python and Bash (PowerShell a plus).
  • Strong experience with Linux and Windows operating systems.
  • Deep understanding of networking, firewalls, and security principles.
  • Experience with configuration management and infrastructure-as-code tools (Salt, Pulumi; Terraform acceptable).
  • Hands-on experience with containerization and orchestration technologies (Docker/Podman, Kubernetes).
  • Exceptional problem-solving and troubleshooting skills.
  • Ability to communicate clearly, prioritize effectively, and make sound decisions under pressure.
  • Working knowledge of AI and machine learning concepts relevant to reliability engineering, including anomaly detection, predictive analytics, pattern recognition, and intelligent automation (hands-on model development not required).
  • Experience leveraging AI-assisted tools to improve incident response, root-cause analysis, capacity planning, and operational efficiency.
  • Ability to critically evaluate AI solutions, understanding tradeoffs related to data quality, bias, reliability, security, and operational risk.
  • Strong judgment in applying AI responsibly within reliability- and security-sensitive environments, maintaining human-in-the-loop decision making.
  • Ability to mentor teams and leaders in building AI literacy and automation-first thinking while maintaining high standards for operational excellence.

Responsibilities

  • Lead operations within the site reliability engineering department, setting priorities based on department strategies and goals.
  • Execute on key performance metrics to ensure the reliability, scalability, and robustness of production systems.
  • Guide the professional development and achievement of direct reports, fostering a culture of continuous learning, accountability, and operational excellence.
  • Oversee the development and implementation of new technologies to enhance system performance, stability, and security.
  • Manage teams to resolve system issues effectively and efficiently, ensuring minimal downtime and disruption to business operations.
  • Collaborate with Engineering, Product, Security, and Data leaders to align on strategic initiatives and drive cross-functional programs.
  • Partner with Product, Security, and Data teams to evaluate and operationalize AI-enabled capabilities such as anomaly detection, predictive monitoring, and intelligent alerting to improve system reliability and performance.
  • Drive responsible adoption of AI and automation within the SRE function, ensuring explainability, reliability, security, and appropriate human oversight in production environments.
  • Maintain the overall performance, stability, and resilience of the company’s technical infrastructure.

Benefits

  • Growth & Development – Opportunities for professional advancement and lifelong learning.
  • Flexibility – Hybrid or remote work options.
  • Comprehensive Benefits – Competitive compensation, equity participation, health and wellness benefits, and generous time-off policies.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service