Infrastructure Engineer, Distributed Compute

Base Power CompanyAustin, TX
Onsite

About The Position

Base is deploying thousands of computing nodes across the country, coordinating them as a single distributed system. We're looking for an Infrastructure Engineer to design, build, and operate the horizontal infrastructure that coordinates, orchestrates, and manages this distributed compute network — enabling device communication, task scheduling, state synchronization, and fleet management at scale. You'll own the backend systems and APIs that allow thousands of devices to reliably communicate with central infrastructure, track their state, receive updates, and execute coordinated commands. This is systems-level work: designing for failure, scale, cost efficiency, and operational simplicity. You'll work closely with device engineers who need reliable communication channels, product teams who need fleet management primitives, operations teams who need visibility and control, and hardware engineers who understand physical constraints. Your infrastructure is the nervous system of this product — it must be fast, reliable, and elegant.

Requirements

  • 5+ years building backend infrastructure or distributed systems, preferably at scale
  • Strong experience in Go, Python, Java, or equivalent backend languages
  • Deep understanding of distributed systems concepts: eventual consistency, state synchronization, failure handling
  • Experience building APIs and services that handle high scale and high concurrency
  • Familiarity with message queues or event streaming (Kafka, RabbitMQ, SQS, or similar)
  • Solid understanding of databases and data modeling — knowing when to use relational vs. NoSQL vs. specialized stores
  • Comfort with infrastructure-as-code and cloud platforms (AWS or GCP)
  • Proven ability to own complex systems end-to-end: design, implementation, deployment, and operational support

Nice To Haves

  • Experience building device management or IoT backend systems
  • Familiarity with Kubernetes and container orchestration
  • Background in energy, utilities, or other operational technology (OT) domains
  • Experience with distributed tracing and observability at scale (Datadog, Honeycomb, etc.)
  • Knowledge of fleet management, device provisioning, or OTA update systems
  • Exposure to consensus algorithms (Raft, Paxos) or distributed coordination (etcd, Zookeeper)
  • Experience with stream processing frameworks (Kafka Streams, Flink, etc.)
  • Experience operating systems in production with clear operational runbooks and runbook discipline
  • Experience with data center orchestration systems and baseboard management controllers

Responsibilities

  • Design and build the core orchestration and coordination layer that manages device fleet operations — task distribution, state synchronization, health monitoring — with >99.9% availability.
  • Build backend systems that reliably handle device-to-cloud communication at scale, including message routing, acknowledgment, retry logic, and conflict resolution for concurrent updates.
  • Develop APIs and services that allow product teams to query device state, push updates, and execute commands on thousands of devices simultaneously without bottlenecks or data consistency issues.
  • Design architectures that scale horizontally from hundreds to millions of devices without re-architecture, while optimizing compute, storage, and network costs.
  • Implement monitoring, alerting, and operational runbooks that allow the team to understand and troubleshoot distributed system behavior in production.
  • Build reliable async communication patterns using message queues and event streaming, handling ordering guarantees, deduplication, and exactly-once semantics.
  • Own the database and storage layer decisions that support both operational and analytical workloads — knowing when to use relational databases, NoSQL stores, or specialized systems.
  • Partner with hardware and device teams to understand their needs and translate them into scalable, reliable backend services.
  • Write infrastructure-as-code that is maintainable, tested, and reproducible, enabling safe and rapid iteration.

Benefits

  • The opportunity to do the best work of your life at Base.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service