About The Position

Engineered to outperform, Teraswitch is on a mission to provide high-performance infrastructure services for critical workloads. With 20+ datacenter locations around the world interconnected by our low latency global backbone network, we are the class leader in performance bare metal hosting and rapidly expanding into additional infrastructure services. The Job The Infrastructure Engineering team at Teraswitch is responsible for the compute, storage, and platform infrastructure that powers our products and internal operations. This senior/staff-level role is focused on building provider-grade hosted compute and storage services—specifically a KVM-based VM product and a distributed object (S3) and block storage product (NVMe/TCP). Qualified candidates will have depth in at least one of these areas. You will help architect and build cloud-scale, globally distributed products for a high-performance infrastructure provider, with an emphasis on automation, scalability, and security by design. While this role has a compute and storage services focus, as a senior member of the Infrastructure Engineering team, you’ll also be expected to cross-train and contribute broadly across infrastructure domains as we grow the team.

Requirements

  • Strong Linux systems and networking expertise, production operations experience
  • Depth in at least one of the following:
  • Compute / virtualization: KVM/QEMU, libvirt and/or platforms such as Proxmox/OpenStack; image pipelines; fleet operations; multi-tenant considerations
  • Distributed storage services: experience with distributed storage platforms (Ceph, VAST, Weka, or similar) and/or managing block/object storage offerings; public/multi-tenant deployment experience is a plus
  • Automation - experience in scripting (Python, bash, etc) and/or configuration management (Ansible or similar)
  • Experience with observability/monitoring systems (metrics, logs, traces, alerting) and using them to enhance production service reliability
  • Comfortable working in a fast-paced, results-oriented environment
  • Committed to operational best practices and security by design

Nice To Haves

  • Service / hosting provider experience (multi-tenant systems, automation-first operations, scalable and secure design)
  • Experience with VPS/KVM hosting at scale, including networking and security
  • Experience with distributed storage systems such as Ceph, Weka, or VAST, particularly in a service provider environment
  • Expertise in object storage / S3 services - gateway/front-door patterns (F5/Nginx/HAProxy), networking, durability, security
  • Strong networking fundamentals relevant to provider environments (routing/segmentation, IPAM/DHCP/DNS integration)
  • Cloud-native observability/monitoring (e.g. Prometheus, Grafana, OpenTelemetry)
  • Kubernetes and cloud-native (CNCF) ecosystem experience
  • Demonstrated ability to design and operate automation-first infrastructure at scale
  • Experience in other Infrastructure team domains - e.g. self-hosted Kubernetes deployment / management, and/or bare metal automation and fleet management

Responsibilities

  • Design and implement provider-scale, globally distributed hosted services - with a focus in either compute (KVM-based cloud), storage (distributed object and block services), or both
  • Compute track: Evaluate/design, implement, and manage a KVM-based cloud compute platform
  • Storage track: Evaluate, implement, and manage a distributed storage platform (Ceph, Weka, VAST, etc) that supports object (S3) and block (NVMe/TCP) protocols
  • Define provisioning workflows, node/fleet management, and scalable operations
  • Integrate service networking primitives (IPAM, DHCP, DNS) and customer interfaces to the product
  • Design multi-tenant provisioning and controls: isolation boundaries, quotas/limits, metering, and security
  • Build automation and tooling for global deployments of these products: upgrades, capacity expansion, failure handling, rebalancing
  • Implement robust observability for these products to enhance production service reliability (metrics, logs, traces; dashboards; actionable alerting)
  • Collaborate with the Software team to integrate these products with our customer control plane (portal, API) and billing systems, ensuring robust customer-driven lifecycle management
  • Cross-train with the rest of the Infrastructure Engineering team and contribute broadly to the compute, storage, and platform infrastructure that powers Teraswitch products and internal operations
  • Participate in an on-call system supporting critical production systems.

Benefits

  • Health, Dental, and Vision Insurance
  • 401(k) with company profit sharing
  • PTO and 11 Company Paid Holidays

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

1-10 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service