Staff Site Reliability Engineer - Cloud Platform & Vehicle Telemetry

ALSO•Palo Alto, CA

48d•$200,000 - $245,000

About The Position

We’re ALSO, an electric mobility company originally conceived as a part of Rivian. We’re a passionate team of builders, dreamers, doers and innovators, focused on creating entirely new (not to mention, innovative and delightful) vertically integrated, small EVs designed to meet the global mobility challenges of today and tomorrow. Our mission is to inspire everyone to ride ALSO—replacing many local car, truck and SUV miles with ones on vehicles that are more affordable, more enjoyable and 10-50x more efficient. ALSO is looking for a Staff Site Reliability Engineer to build and operate scalable, cloud-native systems supporting vehicle telemetry, fleet management, and cohort-based analytics. This role requires deep expertise in distributed systems, Kubernetes, AWS infrastructure, and data pipelines, with strong ownership of reliability and operational excellence.

Requirements

10+ years of experience in data engineering and/or backend platform engineering operating production systems at scale
Deep hands-on experience with large-scale telemetry or IoT data, including high-throughput and low-latency ingestion
Strong expertise in AWS data and infrastructure services (S3, Kinesis/MSK, Glue, EMR, Lambda, Step Functions, EventBridge)
Proven experience owning end-to-end ETL/ELT infrastructure using (batch and streaming) o
Solid understanding of streaming architectures using Kafka or equivalent systems and time-series–optimized storage patterns
Strong backend engineering skills using Python and/or Java/Scala, including API design (REST/gRPC) and distributed systems fundamentals
Experience with data platform architectures such as data lakes and lakehouses, schema registries, and metadata systems
Hands-on experience with orchestration frameworks (Airflow, MWAA, Dagster) and production-grade observability (logging, metrics, tracing)
Infrastructure-as-code expertise using CloudFormation, Terraform, or CDK to manage scalable and reliable systems
A track record of building highly reliable, fault-tolerant systems with clear ownership, strong SLAs, and operational excellence

Nice To Haves

Experience with vehicle, sensor, or IoT data
Streaming-first architectures
Experience supporting real-time inference pipelines
Prior Staff level ownership of data platforms

Responsibilities

Operationalize microservices-based platforms running on Kubernetes (EKS) and AWS ECS
Optimize vehicle telemetry ingestion and data pipelines using streaming systems (Kafka/Kinesis)for high-throughput, low-latency workloads
Lead reliability engineering efforts including SLOs, SLIs, and incident response
Implement advanced observability (Datadog, Grafana, tracing, logging pipelines)
Develop and maintain API Gateway-based service architectures
Own on-call rotations, PagerDuty schedules, and incident response frameworks
Automate infrastructure provisioning using Terraform and CI/CD tools (ArgoCD, Concourse)
Improve system resilience, failover strategies, and multi-region reliability
Partner with product and platform teams to build vehicle lifecycle and cohort management systems

Benefits

Robust health coverage. Excellent health, dental and vision insurance covered up to 100% by ALSO with FSA & HSA options.
One Medical membership and dedicated insurance advocates.
Rich fertility and family building benefits with Progyny.
Flexible time off.
401(k) match.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume