Senior Software Engineer, Cloud

Aerospike
1d$169,000 - $195,000

About The Position

Aerospike is the real-time database for mission-critical use cases and workloads, including machine learning, generative, and agentic AI. Aerospike powers millions of transactions per second with millisecond latency, at a fraction of the total cost of ownership compared to other databases. Global leaders, including Adobe, Airtel, Barclays, Criteo, DBS Bank, Experian, Grab, HDFC Bank, PayPal, Sony Interactive Entertainment, The Trade Desk, and Wayfair, rely on Aerospike for customer 360, fraud detection, real-time bidding, profile stores, recommendation engines, and other use cases. At Aerospike, we dream big and deliver even bigger. Our mission is to unleash the power of the world’s real-time data with a database built for infinite scale, speed, and sustainability. If you're ready to shape the future of data, join us. We’re growing fast and investing deeply in our Cloud Platform—a multi-cloud, multi-tenant platform that brings the power of Aerospike to customers with maximum simplicity, speed, and scale. Senior Software Engineer, Cloud We’re growing fast and investing deeply in our Cloud Platform, with a particular focus on the systems that provision, operate, and monitor Aerospike databases running in customer-dedicated Kubernetes clusters. We’re looking for a Senior Software Engineer to join our Cloud team and play a key role in designing and building the infrastructure orchestration, control loops, and operational systems that power Aerospike Cloud. Your work will directly impact the reliability, scalability, and safety of production database clusters used by customers worldwide. What You’ll Do Drive real impact. You’ll build and evolve the systems that provision and manage Aerospike Cloud clusters, spanning Kubernetes, cloud infrastructure, storage, networking, and long-running workflows. Collaborate deeply. You’ll work closely with product managers, architects, control-plane engineers, and SREs to deliver reliable platform capabilities while balancing customer needs, operational safety, and long-term maintainability. Build for reliability and scale. From safely orchestrating cluster lifecycle operations to handling failure modes in cloud infrastructure, you’ll design and implement systems that behave predictably under load and degrade gracefully when things go wrong. Elevate operational quality. You’ll help define best practices around rollout safety, workflow versioning, observability, and incident prevention, raising the bar for how we operate stateful systems in Kubernetes.

Requirements

  • At least 5 years of relevant software engineering experience
  • Strong foundation in computer science, distributed systems, and debugging complex systems
  • Proficiency in at least one statically typed backend language (preferably Go)
  • Experience developing and operating distributed systems in production
  • Hands-on experience with Kubernetes and containerized workloads
  • Experience with at least one major cloud provider (AWS preferred)
  • Experience designing, deploying, and operating stateful systems
  • Familiarity with Git-based workflows and CI/CD pipelines

Nice To Haves

  • Proficiency in Go
  • Experience with Terraform and infrastructure-as-code
  • Experience with Kubernetes Operators, controllers, or CRDs
  • Experience with workflow orchestration systems (Temporal, Cadence, Airflow, etc.)
  • Strong understanding of cloud networking concepts (VPCs, subnets, IP management, load balancers)
  • Experience with observability stacks (Prometheus, OpenTelemetry, Datadog)
  • Experience operating storage systems (EBS, instance store, backups)

Responsibilities

  • Design, implement, and maintain components and workloads responsible for provisioning and managing Kubernetes-based Aerospike clusters
  • Build and evolve workflows used to orchestrate long-running infrastructure and database lifecycle operations
  • Develop Kubernetes-native systems using controllers, operators, and CRDs
  • Own cloud infrastructure automation using Terraform, including VPCs, EKS clusters, IAM, storage, and networking
  • Design and maintain persistent storage lifecycles involving EBS volumes, local NVMe instance storage, and backups
  • Diagnose and resolve complex production issues spanning Kubernetes, cloud provider APIs, and distributed systems
  • Improve observability through metrics, logs, and alerts using Prometheus, OpenTelemetry, and Datadog
  • Collaborate across teams to evolve architecture, improve reliability, and prevent operational regressions

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

51-100 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service