Infrastructure Engineer, Distributed Compute

Base Power Company•Austin, TX

1d•Onsite

About The Position

Base is deploying thousands of computing nodes across the country, coordinating them as a single distributed system. We're looking for an Infrastructure Engineer to design, build, and operate the horizontal infrastructure that coordinates, orchestrates, and manages this distributed compute network — enabling device communication, task scheduling, state synchronization, and fleet management at scale. You'll own the backend systems and APIs that allow thousands of devices to reliably communicate with central infrastructure, track their state, receive updates, and execute coordinated commands. This is systems-level work: designing for failure, scale, cost efficiency, and operational simplicity. You'll work closely with device engineers who need reliable communication channels, product teams who need fleet management primitives, operations teams who need visibility and control, and hardware engineers who understand physical constraints. Your infrastructure is the nervous system of this product — it must be fast, reliable, and elegant.

Requirements

5+ years building backend infrastructure or distributed systems, preferably at scale
Strong experience in Go, Python, Java, or equivalent backend languages
Deep understanding of distributed systems concepts: eventual consistency, state synchronization, failure handling
Experience building APIs and services that handle high scale and high concurrency
Familiarity with message queues or event streaming (Kafka, RabbitMQ, SQS, or similar)
Solid understanding of databases and data modeling — knowing when to use relational vs. NoSQL vs. specialized stores
Comfort with infrastructure-as-code and cloud platforms (AWS or GCP)
Proven ability to own complex systems end-to-end: design, implementation, deployment, and operational support

Nice To Haves

Experience building device management or IoT backend systems
Familiarity with Kubernetes and container orchestration
Background in energy, utilities, or other operational technology (OT) domains
Experience with distributed tracing and observability at scale (Datadog, Honeycomb, etc.)
Knowledge of fleet management, device provisioning, or OTA update systems
Exposure to consensus algorithms (Raft, Paxos) or distributed coordination (etcd, Zookeeper)
Experience with stream processing frameworks (Kafka Streams, Flink, etc.)
Experience operating systems in production with clear operational runbooks and runbook discipline
Experience with data center orchestration systems and baseboard management controllers

Responsibilities

Design and build the core orchestration and coordination layer that manages device fleet operations — task distribution, state synchronization, health monitoring — with >99.9% availability.
Build backend systems that reliably handle device-to-cloud communication at scale, including message routing, acknowledgment, retry logic, and conflict resolution for concurrent updates.
Develop APIs and services that allow product teams to query device state, push updates, and execute commands on thousands of devices simultaneously without bottlenecks or data consistency issues.
Design architectures that scale horizontally from hundreds to millions of devices without re-architecture, while optimizing compute, storage, and network costs.
Implement monitoring, alerting, and operational runbooks that allow the team to understand and troubleshoot distributed system behavior in production.
Build reliable async communication patterns using message queues and event streaming, handling ordering guarantees, deduplication, and exactly-once semantics.
Own the database and storage layer decisions that support both operational and analytical workloads — knowing when to use relational databases, NoSQL stores, or specialized systems.
Partner with hardware and device teams to understand their needs and translate them into scalable, reliable backend services.
Write infrastructure-as-code that is maintainable, tested, and reproducible, enabling safe and rapid iteration.