As a Senior Site Reliability Engineer, you will be responsible for designing, implementing, and operating scalable, reliable, and secure infrastructure to support large-scale AI and HPC workloads. You will play a key role in building and maintaining CI/CD pipelines, Kubernetes-based environments, and observability systems that ensure high availability and performance across globally distributed platforms. Working closely with engineering, product, and operations teams, you will drive automation, enforce SRE best practices, and contribute to a resilient and efficient infrastructure ecosystem that supports mission-critical applications.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior