Stack AV Site Reliability Engineers are responsible for enabling and ensuring our production systems meet their service-level objectives. Through the implementation of centralized observability and automation, the SRE team constantly ensures the health, reliability, scalability, and performance of Stack AV’s infrastructure. In this SRE role, you will be ensuring the robustness and operational readiness of the compute platform that powers large-scale autonomous systems development. The team is responsible for enabling engineers and researchers to efficiently run compute and data intensive workloads on StackAV infrastructure. The Compute Platform team is responsible for designing and operating the systems that orchestrate and scale batch and distributed workloads across our environments. You will work at the intersection of infrastructure, distributed systems, and developer experience–ensuring that complex workloads are reliable, efficient, and easy to run. As a Compute Platform SRE, you help be responsible for ensuring the operational readiness and maturity of high scale batch compute systems and workflow orchestration systems that power engineers across the company.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed