Platform Operations Manager

LeidosBethesda, MD
$154,050 - $278,475Hybrid

About The Position

Leidos is excited to present an opportunity for a TS/SCI‑cleared Platform Operations Manager to join a high‑impact team driving the design, development, and deployment of a modern technology stack supporting the DOMEX Data Discovery Platform (D3P) Modernization Program. This role directly supports our customer’s mission to centralize and standardize the Tasking, Collection, Processing, Exploitation, and Dissemination (TCPED) of Open Source Intelligence (OSINT) across the Defense Intelligence Enterprise. You’ll be part of a mission‑focused, solutions‑oriented team that values inclusion, innovation, collaboration, and continuous professional growth. While the majority of work is performed on‑site at our customer location in Bethesda, MD, we offer a flexible schedule, and some tasks may be completed remotely. As a Platform Operations Manager you will ensure the availability, reliability, and performance of a full‑stack, containerized microservices platform. You’ll help cultivate a strong DevSecOps culture and collaborate closely with systems engineering, architecture, development, security, operations, and integration teams in a fast‑paced environment. You will partner with a multidisciplinary team of systems engineers, developers, integrators, and system administrators to lead efforts in the following areas: System Reliability & Performance — Ensuring uptime, performance, and capacity planning for a large‑scale big data production platform with a microservice architecture running on Kubernetes, Elasticsearch, PostgreSQL, Kafka, and technologies such as Java, Python, React, and low‑code tools like Appian Monitoring & Observability — Leveraging monitoring tools to proactively detect and resolve issues Incident Response — Leading triage, troubleshooting, root‑cause analysis, and post‑incident reviews SLIs & SLOs — Defining and tracking reliability metrics Management Oversight — Leading a team of system administrators supporting a help desk during core hours; setting technical standards and mentoring staff Technical Leadership — Partnering with systems engineers to design solutions, contribute to documentation, and support architectural alignment SAFe Agile — Participating in release planning, scrums, design sessions, bug triage, and cross‑team coordination You bring enthusiasm, strong collaboration skills, and the ability to work effectively with teammates across varying technical backgrounds.

Requirements

  • BS in Engineering, Computer Science, Systems Engineering, or related field (or equivalent experience) with 15+ years of relevant experience; 13+ years with a Master’s; additional experience may substitute for a degree
  • Active TS/SCI clearance with the ability to obtain and maintain a polygraph
  • At least one DoD 8570.01‑M IAT Level II+ certification (e.g., Security+ CE, CySA+, CCNA Security, SSCP, CISSP (or Associate))
  • Ability to obtain Privileged User Account (PUA) certification
  • Experience with Kubernetes, GitLab pipelines, Linux, and containerized environments
  • Experience supporting enterprise‑scale production systems
  • Experience with cloud services (preferably AWS) and cloud infrastructure
  • Familiarity with Elasticsearch, PostgreSQL, Logstash, Kibana, and Keycloak
  • Demonstrated success in cross‑functional coordination and execution
  • Team leadership and line management experience
  • Strong communication skills and the ability to perform under pressure during incidents

Nice To Haves

  • Experience with Agile methodologies
  • Development experience (Bash, PowerShell, SALT, Python, Groovy, Java, etc.)
  • Experience with Appian or other low‑code platforms
  • Experience with technologies such as Kafka, AMQP/JMS, Prometheus/Grafana, GPU‑based Kubernetes, SALT automation, Nexus, or GraphQL
  • Knowledge of security best practices (authN/Z, secrets management, data protection)
  • Infrastructure‑as‑code experience (CloudFormation, Terraform, Pulumi)
  • AWS cloud certifications

Responsibilities

  • Ensuring uptime, performance, and capacity planning for a large‑scale big data production platform with a microservice architecture running on Kubernetes, Elasticsearch, PostgreSQL, Kafka, and technologies such as Java, Python, React, and low‑code tools like Appian
  • Leveraging monitoring tools to proactively detect and resolve issues
  • Leading triage, troubleshooting, root‑cause analysis, and post‑incident reviews
  • Defining and tracking reliability metrics
  • Leading a team of system administrators supporting a help desk during core hours; setting technical standards and mentoring staff
  • Partnering with systems engineers to design solutions, contribute to documentation, and support architectural alignment
  • Participating in release planning, scrums, design sessions, bug triage, and cross‑team coordination

Benefits

  • competitive compensation
  • Health and Wellness programs
  • Income Protection
  • Paid Leave
  • Retirement
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service