As a Software Engineer on our Site Reliability team at Sierra, you will be responsible for defining and building the foundation of reliability, observability, and scalability across Sierra’s AI-driven infrastructure. You’ll partner closely with our core engineering and product teams to ensure our systems are highly available, efficient, and built for growth. Own Sierra’s observability stack—monitoring, alerting, logging, and tracing—to give engineers clear visibility into system health and performance. Partner with product and platform engineers to design systems that are reliable and scalable from day one—not as an afterthought. Design and implement scalable, reliable, and secure cloud infrastructure (AWS) using Terraform and modern DevOps tooling. Improve the reliability and scalability of our LLM deployments, ensuring robust, performant, and cost-effective operation. Lead improvements to deployment pipelines, CI/CD tooling, and incident management processes to reduce downtime and response time. Define the foundation of SRE practices at Sierra, influencing culture, tooling, and best practices across the engineering org.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
Bachelor's degree
Number of Employees
251-500 employees