We are seeking a Site Reliability Engineer to own the reliability, scalability, and operational excellence of Hammerhead's AI-driven power orchestration platform. Our software runs in production data centers around the world, where real-time decisions directly affect gigawatts of compute infrastructure. Availability, latency, and correctness are not negotiable. You will work at the boundary between software and infrastructure, building the systems that deploy, monitor, and protect Hammerhead's platform in production. You will partner with engineering teams to establish SLOs, automate toil, accelerate releases, and ensure that when things go wrong, we know fast and recover faster. This is a foundational SRE role. You will be the first dedicated hire in this function. You will set the standard for how Hammerhead runs software in production. You will report to the Head of Engineering.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed