The Lead Platform Reliability Engineer (PRE) ensures the stability, performance, and scalability of the shared platform that supports internal AI solution development. It combines software engineering, SRE practices, and operations to keep the platform reliable and developer-friendly. The role involves operating scalable backend services supporting high-traffic agent interactions, retrieval operations, and real-time execution flows. The PRE will also maintain AI services runbooks, playbooks, and enablement for GOCC. Collaboration with global engineering, security, and AI governance teams is essential to ensure compliance with cross-geo regulations and Asia’s data residency requirements.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior