About The Position

We are working with a long-standing anchor client to source a T3 Operations & Support Specialist (Storage) for a large-scale cloud-native platform programme supporting a major energy transmission operator in Germany. The platform is a service-oriented hybrid cloud environment providing application teams with self-service capabilities to develop, run and operate software products across private and public cloud infrastructure. In this role you will provide Tier-3 operational ownership for Storage products within Local Production (DE), handling complex incidents, deep troubleshooting and root cause analysis, and driving permanent fixes, automation and preventive measures across storage services.

Requirements

  • 5+ years in IT storage operations, service delivery or platform operations with demonstrated leadership in mission-critical environments
  • Proven experience implementing and leading Incident, Problem, Change and Release governance in production
  • Experience supporting platform workloads that rely on shared storage services
  • Storage types: File, Block and Object Storage via NetApp (ONTAP)
  • Protocols/services: NFS and object storage operations (S3-like concepts)
  • Kubernetes storage integration: CSI driver concepts and troubleshooting (PV/PVC lifecycle)
  • Virtualisation: experience operating storage virtualisation in enterprise environments
  • ITSM/collaboration tooling: Jira Service Management, Jira, Confluence
  • Fundamental understanding of core operations processes (Incident, Change, Problem management, ITSM) and SRE concepts
  • Experience gathering operational insights from monitoring/observability including SLI/SLA/SLO management and tracking
  • Hands-on experience documenting procedures and enforcing clear runbooks and playbooks
  • Hands-on experience with monitoring and logging tools (e.g. Prometheus, Grafana, Datadog, Mimir, Loki)
  • Understanding of modern platform operations (Kubernetes/containers, automation, observability) sufficient to govern specialists
  • Fluent English and German (C1 minimum in both)

Nice To Haves

  • Experience operating in regulated or high-availability industries (banking, telco, public sector, healthcare)
  • Experience with SRE practices (SLOs/SLIs, error budgets) and reliability management
  • Experience operating storage services that integrate with Kubernetes platforms
  • Familiarity with IaC-based provisioning and GitOps-driven operational patterns
  • Familiarity with enterprise DevOps toolchains (GitLab, JFrog Artifactory, Backstage, Harness)

Responsibilities

  • Providing T3 operational ownership for Storage services: handling complex incidents, deep troubleshooting and RCA, and driving permanent fixes and preventive measures
  • Ensuring operational readiness for storage changes: monitoring/alerting coverage, performance baselines, hardening, patch strategy, rollback and recovery procedures, and runbooks
  • Executing and improving standard operational procedures through automation (capacity checks, validation procedures, provisioning workflows)
  • Validating deployment artefacts from an operations perspective and enforcing quality assurance measures
  • Monitoring system health, performance metrics and service availability across multi-tenant environments
  • Identifying, analysing and resolving incidents to minimise service disruption, and triggering RCA and corrective actions
  • Implementing monitoring and logging strategies to support audit and compliance requirements
  • Performing routine security scans and remediating identified vulnerabilities

Benefits

  • flexible working hours
  • the freedom to choose your own projects
  • access to exciting projects in various industries
  • supports you in advancing your career
  • competitive pay
  • a dedicated team to help you with any questions you may have
  • Work independently
  • utilise our strong network to achieve your professional goals
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service