Apache Druid Platform Admin / Engineer- Irving, TX

Cystems Logic IncIrving, TX
Onsite

About The Position

We have a long-term job opening for a Sr. Apache Druid Administrator in Irving, TX. This is a 12-month contract role requiring a minimum of 10 years of experience. The role involves designing, deploying, administering, and optimizing large-scale Apache Druid clusters supporting PB-scale datasets. The candidate will manage and troubleshoot Druid services, perform cluster upgrades, patching, capacity planning, and platform modernization. Responsibilities include monitoring cluster health, ingestion performance, query latency, segment distribution, and resource utilization, as well as troubleshooting ingestion failures, stuck tasks, compaction issues, retention policies, and segment management problems. Optimization of Druid queries, partitioning strategies, indexing specifications, and compaction configurations is also key. The role requires configuring and maintaining Druid metadata stores using MySQL, developing operational automation using Ansible and Infrastructure-as-Code methodologies, and creating dashboards, alerts, and monitoring solutions using Grafana, Prometheus, and logging platforms. Collaboration with platform teams for high availability, disaster recovery, backup, and security compliance, along with participation in production support, incident response, root cause analysis, and performance tuning initiatives, is expected. The administrator will also work with architects and business teams to understand analytical requirements and translate them into scalable solutions.

Requirements

  • Minimum of 10 Years of Experience
  • Apache Druid expertise

Responsibilities

  • Design, deploy, administer, and optimize large-scale Apache Druid clusters supporting PB-scale datasets.
  • Manage and troubleshoot Druid services.
  • Perform cluster upgrades, patching, capacity planning, and platform modernization activities.
  • Monitor cluster health, ingestion performance, query latency, segment distribution, and resource utilization.
  • Troubleshoot ingestion failures, stuck tasks, compaction issues, retention policies, and segment management problems.
  • Optimize Druid queries, partitioning strategies, indexing specifications, and compaction configurations.
  • Configure and maintain Druid metadata stores using MySQL.
  • Develop operational automation using Ansible and Infrastructure-as-Code methodologies.
  • Create dashboards, alerts, and monitoring solutions using Grafana, Prometheus, and logging platforms.
  • Work with platform teams to ensure high availability, disaster recovery, backup, and security compliance.
  • Participate in production support, incident response, root cause analysis, and performance tuning initiatives.
  • Collaborate with architects and business teams to understand analytical requirements and translate them into scalable solutions.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service