Senior Software Engineer, Cloud

Aerospike

26d•$169,000 - $195,000

About The Position

Aerospike is the real-time database for mission-critical use cases and workloads, including machine learning, generative, and agentic AI. Aerospike powers millions of transactions per second with millisecond latency, at a fraction of the total cost of ownership compared to other databases. Global leaders, including Adobe, Airtel, Barclays, Criteo, DBS Bank, Experian, Grab, HDFC Bank, PayPal, Sony Interactive Entertainment, The Trade Desk, and Wayfair, rely on Aerospike for customer 360, fraud detection, real-time bidding, profile stores, recommendation engines, and other use cases. At Aerospike, we dream big and deliver even bigger. Our mission is to unleash the power of the world’s real-time data with a database built for infinite scale, speed, and sustainability. If you're ready to shape the future of data, join us. We’re growing fast and investing deeply in our Cloud Platform—a multi-cloud, multi-tenant platform that brings the power of Aerospike to customers with maximum simplicity, speed, and scale. Senior Software Engineer, Cloud We’re growing fast and investing deeply in our Cloud Platform, with a particular focus on the systems that provision, operate, and monitor Aerospike databases running in customer-dedicated Kubernetes clusters. We’re looking for a Senior Software Engineer to join our Cloud team and play a key role in designing and building the infrastructure orchestration, control loops, and operational systems that power Aerospike Cloud. Your work will directly impact the reliability, scalability, and safety of production database clusters used by customers worldwide. What You’ll Do Drive real impact. You’ll build and evolve the systems that provision and manage Aerospike Cloud clusters, spanning Kubernetes, cloud infrastructure, storage, networking, and long-running workflows. Collaborate deeply. You’ll work closely with product managers, architects, control-plane engineers, and SREs to deliver reliable platform capabilities while balancing customer needs, operational safety, and long-term maintainability. Build for reliability and scale. From safely orchestrating cluster lifecycle operations to handling failure modes in cloud infrastructure, you’ll design and implement systems that behave predictably under load and degrade gracefully when things go wrong. Elevate operational quality. You’ll help define best practices around rollout safety, workflow versioning, observability, and incident prevention, raising the bar for how we operate stateful systems in Kubernetes.

Requirements

At least 5 years of relevant software engineering experience
Strong foundation in computer science, distributed systems, and debugging complex systems
Proficiency in at least one statically typed backend language (preferably Go)
Experience developing and operating distributed systems in production
Hands-on experience with Kubernetes and containerized workloads
Experience with at least one major cloud provider (AWS preferred)
Experience designing, deploying, and operating stateful systems
Familiarity with Git-based workflows and CI/CD pipelines

Nice To Haves

Proficiency in Go
Experience with Terraform and infrastructure-as-code
Experience with Kubernetes Operators, controllers, or CRDs
Experience with workflow orchestration systems (Temporal, Cadence, Airflow, etc.)
Strong understanding of cloud networking concepts (VPCs, subnets, IP management, load balancers)
Experience with observability stacks (Prometheus, OpenTelemetry, Datadog)
Experience operating storage systems (EBS, instance store, backups)

Responsibilities

Design, implement, and maintain components and workloads responsible for provisioning and managing Kubernetes-based Aerospike clusters
Build and evolve workflows used to orchestrate long-running infrastructure and database lifecycle operations
Develop Kubernetes-native systems using controllers, operators, and CRDs
Own cloud infrastructure automation using Terraform, including VPCs, EKS clusters, IAM, storage, and networking
Design and maintain persistent storage lifecycles involving EBS volumes, local NVMe instance storage, and backups
Diagnose and resolve complex production issues spanning Kubernetes, cloud provider APIs, and distributed systems
Improve observability through metrics, logs, and alerts using Prometheus, OpenTelemetry, and Datadog
Collaborate across teams to evolve architecture, improve reliability, and prevent operational regressions

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume