The Observability Platform team, part of eBay's core Site Reliability Engineering (SRE) organization, is dedicated to enhancing the reliability, performance, and efficiency of eBay's global platform. Our mission is to build intelligent, scalable tools and solutions that empower our SRE and domain engineering teams to maintain operational excellence. We develop and maintain a suite of advanced, AI-driven systems by employing a wealth of operational data. Our real-time anomaly detection platform analyzes high-volume time-series metrics to predict and flag service degradations. We automate troubleshooting with a sophisticated root cause analysis engine that correlates metrics, events, logs, and traces to pinpoint failure origins. Furthermore, we are pioneering the use of GenAI to build an LLM-based agentic system to automate complex operational tasks, and a novel suite of AI-powered explainability tools to clarify the behavior of distributed systems.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior