We are seeking an Site Reliability Engineer Lead to own and evolve the reliability, scalability, and operational excellence of cloud-native data platforms running primarily on Google Cloud Platform (GCP). This role supports data systems that ingest, process, and serve large volumes of operational data from oilfield and energy environments. The ideal candidate is a cloud-first SRE with deep GCP experience, strong Python engineering skills, and a track record of leading reliability initiatives for data-intensive systems. Lead SRE practices for GCP-based data platforms Design and own SLIs, SLOs, error budgets, and reliability metrics Build and maintain cloud-native observability (monitoring, logging, alerting) Lead incident response for production cloud systems and drive postmortems Partner with data engineering and platform teams to design reliable architectures Automate operational workflows using Python Drive improvements in CI/CD, infrastructure as code, and deployment safety Mentor engineers and set SRE best practices across the team
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
5,001-10,000 employees