We are seeking a Lead Site Reliability Engineer to manage and optimize data platform operations, ensuring high availability, scalability, and performance. In this role, you will oversee cloud infrastructure, end-to-end data pipelines, and containerized applications while collaborating closely with data science, ML/GenAI, and development teams. You will implement Infrastructure as Code, monitor systems for observability, and troubleshoot complex issues to maintain operational excellence. The ideal candidate thrives in a fast-paced environment, embraces automation, and drives innovation across cloud and data platforms. This role combines hands-on technical expertise with strategic system design, delivering measurable impact on business-critical data workflows.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Number of Employees
11-50 employees