About The Position

Engineered to outperform, Teraswitch is on a mission to provide high-performance infrastructure services for critical workloads. With 20+ datacenter locations around the world interconnected by our low latency global backbone network, we are the class leader in performance bare metal hosting and rapidly expanding into additional infrastructure services. The Job The Infrastructure Engineering team at Teraswitch is responsible for the compute, storage, and platform infrastructure that powers our products and internal operations. This senior/staff-level role will architect and lead our global, self-hosted Kubernetes deployment and help drive our cloud-native approach to both internal and customer-facing services. You’ll design for a self-hosted (bare metal) environment, without relying on cloud-managed control planes, load balancers, or databases. This role will also build reusable platform capabilities and operating models to facilitate Kubernetes adoption by other teams, and help drive and support adoption of cloud-native across the organization. While this role has a Kubernetes / platform focus, as a senior member of the Infrastructure Engineering team, you’ll also be expected to cross-train and contribute broadly across infrastructure domains as we grow the team.

Requirements

  • Strong experience operating production Kubernetes, including cluster lifecycle responsibilities (e.g. provisioning, management, upgrades, observability, troubleshooting, storage, networking)
  • Experience in self-hosted Kubernetes architectures (i.e. without relying on cloud-managed control planes and managed apps)
  • Experience with internal platform capabilities (GitOps/CI/CD integration, paved roads, developer enablement)
  • Experience with cloud-native observability/monitoring (metrics, logs, traces, alerting)
  • Strong Linux systems and networking expertise
  • Comfortable working in a fast-paced, results-oriented environment
  • Committed to operational best practices and security by design

Nice To Haves

  • Experience with multi-cluster, multi-region Kubernetes management (including app deployments and HA/DR strategy)
  • Deep Kubernetes storage knowledge; hands-on experience managing and integrating persistent software-defined cluster storage (e.g. Longhorn, Ceph, VAST, etc)
  • Deep Kubernetes networking knowledge; hands-on experience with advanced cluster networking (e.g. BGP and other mechanisms for workload HA)
  • Solid understanding of and experience implementing Kubernetes security best practices (secure network policies / workload security, RBAC, secrets management, vulnerability management, compliance-oriented controls/reporting)
  • Cloud-native database self-hosting / management experience (MySQL, Postgres) - for example, using tools like CloudNativePG or Vitess
  • Experience with KubeVirt and/or other VM-on-Kubernetes deployments
  • Production-grade, cloud-native observability design (metrics/logs/traces correlation, OpenTelemetry pipelines, Prometheus/Grafana).
  • Service / hosting provider experience (multi-tenant systems, automation-first operations, scalable and secure design)
  • Automation experience - scripting (Python, bash, etc) and/or configuration management (Ansible, etc)
  • Experience with CI/CD and/or GitOps deployment models and workflows
  • Experience in other Infrastructure team domains - e.g. distributed storage systems (block or object storage services), KVM-based virtualization (cloud services), and/or bare metal automation / fleet management

Responsibilities

  • Architect and lead our globally distributed, self-hosted Kubernetes deployment (including provisioning and management, multi-site app deployments, HA/DR strategy, failure domains, etc)
  • Define and implement our Kubernetes storage and networking strategies
  • Define and implement our Kubernetes security posture: secure network policies, RBAC, container security, secrets management, vulnerability management, compliance-oriented controls and reporting
  • Drive modern, cloud-native observability/monitoring for the platform and workloads
  • Deliver platform capabilities for developers: namespaces/tenancy model, “paved road” patterns and templates, standard ingress/certs/secrets approaches, documentation
  • Collaborate with and support the Software team and other internal stakeholders on cloud-native deployments, acting as an internal cloud-native SME
  • Cross-train with the rest of the Infrastructure Engineering team and contribute broadly to the compute, storage, and platform infrastructure that powers Teraswitch products and internal operations
  • Participate in an on-call system supporting critical production systems.

Benefits

  • Health, Dental, and Vision Insurance
  • 401(k) with company profit sharing
  • PTO and 11 Company Paid Holidays
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service