Cox Automotive is looking for a Senior Site Reliability Engineer (SRE) to join our Manheim Logistics SRE team . The SRE team is tasked with designing and maintaining AWS infrastructure and deployment pipelines for Manheim Logistics’ 15+ development teams. The team has currently standardized on a Docker-based infrastructure solution and is adding functionality to support new development team requests and architectural patterns (such as Lambda, Step Functions, Fargate, etc). The SRE team has a strong focus on IaC with Terraform and best practices such as least privilege access, proactive monitoring and alerting, etc. This role will work directly with a release train and help with IaC and SRE activites such as improving monitoring/alerting, defining an error budget, assisting with DevSecOps, etc. As a Senior Site Reliability Engineer at Cox Automotive you will: Take complex problems and come up with a technically reasonable solution Experience working with and defining SLOs, error budgets, etc. Have innate curiosity about how things work Design and assist in the authoring of software tools that reliably manage application delivery & performance Design and assist in the setup and maintenance of application monitoring and alerting Engage with engineering teams to ensure best practices are implemented Improve predictability and reliability of software releases, workflows, and operating software. Reduce mean time to recovery (MTTR) by helping troubleshoot, monitor, alert, and automating recovery. Solid written communication, problem solving, and process management skills
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
5,001-10,000 employees