About The Position

We are looking for a Systems Development Engineer to build the automation, tooling, and operational infrastructure that keep this large-scale, mission-critical service reliable, secure, and efficient. In this role you will treat operations as a software problem — eliminating manual toil, hardening our deployment and monitoring systems, and ensuring our replication and recovery fleet runs flawlessly across a broad and heterogeneous environment. A key dimension of this role is breadth: DRS supports a wide range of operating systems (multiple Linux distributions and Windows versions) and both x86/64 and ARM64 (Graviton) architectures, so your automation and tooling must be robust across diverse OS and hardware combinations.

Requirements

  • Experience in automating, deploying, and supporting large-scale infrastructure
  • Experience programming with at least one modern language such as Python, Ruby, Golang, Java, C++, C#, Rust
  • Experience with Linux/Unix
  • Experience with CI/CD pipelines build processes

Nice To Haves

  • Experience with distributed systems at scale

Responsibilities

  • Design and build software that automates infrastructure provisioning, deployments, and recurring operational workflows, reducing manual effort and on-call burden across the DRS fleet.
  • Build and improve pipelines, deployment guardrails, and rollback mechanisms to ship changes safely across all regions and platform variants.
  • Develop and maintain tooling that works reliably across a wide range of operating systems (various Linux distributions and Windows) and both x86/64 and ARM64 (Graviton) architectures.
  • Implement monitoring, alarming, and self-healing systems to detect and remediate issues before they impact customers' replication and recovery operations.
  • Tune and scale the systems behind continuous replication, capacity management, and recovery orchestration to handle growth gracefully.
  • Drive down ticket and incident volume through durable, programmatic fixes; lead root-cause analysis and contribute to runbooks and operational best practices.
  • Partner with security teams to harden the service and remediate findings, ensuring fixes are deployed consistently across the fleet.
  • Build automation and tooling that serves multiple teams and raises the operational bar across DRS.

Benefits

  • health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
  • 401(k) matching
  • paid time off
  • parental leave
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service