Senior Infrastructure Engineer - On-Prem

C-Serv

2d•Onsite

About The Position

Most organizations moved to the cloud and never looked back. But some of the most complex, mission-critical workloads still demand something different — the precision, control, and resilience that only on-prem infrastructure can deliver. We’re looking for a Senior Platform Engineer who has spent real time building and operating enterprise-grade platform services outside the public cloud. Someone who knows how to bring cloud-like capabilities — think EMR, S3, Lambda, SQS — into on-prem and hybrid environments, and who thrives in the kind of work where you can’t just click “provision” and walk away. If you’ve designed, deployed, and operated big data infrastructure, distributed storage, caching layers, and observability stacks on bare metal or private cloud — this role was written for you.

Requirements

Proven production experience bringing cloud-like capabilities (EMR, S3, Lambda, SQS equivalents) to on-prem solutions.
Hands-on experience setting up Big Data infrastructure on-prem (e.g., Hadoop, EMR on bare metal).
Infrastructure as Code expertise: Terraform, ArgoCD, and GitHub Actions.
Strong GitOps and pipeline automation skills — you believe in repeatable, auditable deployments.
Advanced Kubernetes skills including cluster administration, not just application deployment.
Production experience with distributed storage infrastructure (MinIO / S3-alike).
Production experience with distributed cache infrastructure (Redis, Memcached).
Solid grounding in observability: OpenTelemetry, Prometheus + Grafana, and Loki.

Nice To Haves

Experience managing local artifact registries (Nexus or similar).
Hypervisor and OS-level troubleshooting and performance analysis.
Backup and recovery strategies for distributed data systems.
Capacity analysis and scalability forecasting — you plan ahead, not just react.

Responsibilities

Design, deploy, and operate enterprise-grade platform services across on-prem and hybrid cloud environments.
Build and maintain Infrastructure as Code pipelines using Terraform, ArgoCD, and GitHub Actions — bringing GitOps discipline to every deployment.
Run and scale Kubernetes clusters at an advanced level: not just deploying workloads, but managing cluster administration, networking, and security.
Stand up and manage Big Data infrastructure on-prem (Hadoop, EMR-equivalent), delivering the same analytical power enterprises expect from cloud — without it.
Architect and operate distributed storage systems (MinIO, S3-compatible) and distributed caching layers (Redis, Memcached) in production environments.
Build and maintain a full observability stack — OpenTelemetry, Prometheus, Grafana, and Loki — ensuring the team has the visibility to operate with confidence.