Hyperbolic Labs is on a mission to democratize AI by breaking down the barriers to computing power with our Open-Access AI Cloud. By aggregating computing resources across the globe, we offer an innovative GPU marketplace and AI inference service that promise affordability and accessibility for all. As pioneers at the intersection of AI and open-source technology, we believe in an open future where AI innovation is limited only by imagination, not by access to resources. We're looking for forward-thinking individuals who share our passion for making AI universally accessible, secure, and affordable. Join us in building a platform that empowers innovators everywhere to turn their visionary AI projects into reality. As we prepare for growth after our Series A, our team — led by co-founders with PhDs in AI, Math, and Computer Science — is poised to redefine computing. We're seeking a Site Reliability Engineer to ensure Hyperbolic's GPU marketplace and AI infrastructure operate with exceptional reliability, performance, and security. As an aggregator of compute resources from hundreds of global suppliers, our SLOs, trust, and economic efficiency are product-critical. You'll be responsible for defining and maintaining service level objectives for job success rates, building robust incident response systems, managing capacity across our distributed GPU network, and implementing secure rollout and rollback mechanisms that keep our platform running smoothly 24/7. In this role, you'll establish the reliability standards that define customer trust in our platform, design monitoring and alerting systems that provide deep visibility into our infrastructure, build automation for capacity management and resource allocation, lead incident response and post-mortem processes, and work closely with engineering teams to improve system resilience. You'll also focus on security and infrastructure hardening, ensuring strong isolation between tenants and suppliers, implementing key management systems, and building compliance frameworks. This is a high-impact position where your work directly influences our ability to deliver on our promise of affordable, accessible AI compute at scale.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
No Education Listed