Director - AI Platform Engineering

eBay•San Jose, CA

1d•$240,800 - $321,500•Remote

About The Position

At eBay, we're more than a global ecommerce leader — we’re changing the way the world shops and sells. Our platform empowers millions of buyers and sellers in more than 190 markets around the world. We’re committed to pushing boundaries and leaving our mark as we reinvent the future of ecommerce for enthusiasts. Our customers are our compass, authenticity thrives, bold ideas are welcome, and everyone can bring their unique selves to work — every day. We're in this together, sustaining the future of our customers, our company, and our planet. Join a team of passionate thinkers, innovators, and dreamers — and help us connect people and build communities to create economic opportunity for all. Director, AI Platform Development About the Role Join eBay as we develop a next-generation AI platform for intelligent marketplace experiences! We are seeking a Director of AI Platform Engineering to lead the infrastructure, services, and developer experiences that enable large-scale AI development and production deployment. This is an outstanding opportunity to build an ambitious AI platform that powers our global e-commerce engine. You will drive the strategy for core capabilities spanning LLM inference, GPU infrastructure, MLOps, and agentic platforms. This role requires a proven technical leader who can scale engineering teams and collaborate with Research, Product, and Infrastructure partners.

Requirements

10+ years overall experience with 5+ years of engineering leadership experience, including managing managers and/or leading multiple engineering teams.
Strong understanding of AI/ML platform architecture, including training, experimentation, model management, deployment, inference, observability, and developer workflows.
Familiarity with LLM inference, RAG systems, model serving runtimes, latency and efficiency optimization, evaluation, safety, and monitoring.
Ability to translate ambiguous business and research needs into clear platform strategy, technical roadmaps, operating models, and measurable execution plans.
Excellent communication skills, with the ability to influence senior technical, product, and executive collaborators.

Nice To Haves

Experience supporting both training and inference workloads across on-premise and cloud-based GPU environments.
Experience with technologies like Ray/KubeRay, vLLM, PyTorch, TensorRT, SGLang, MLflow, JupyterHub or equivalent technologies is a plus.

Responsibilities

Lead teams building production-grade LLM inference and serving platforms, including capacity planning, runtime optimization, benchmarking, release engineering, reliability, and cost efficiency.
Build and scale an agentic AI platform that enables teams to develop, deploy, monitor, and govern AI agents, multi-agent workflows, tool-calling systems, retrieval-augmented generation patterns, orchestration frameworks, and human-in-the-loop experiences.
Lead development of MLOps and AI control plane services, including model management, experiment tracking, metadata, deployment workflows, approval gates, APIs, SDKs, and self-service developer experiences.
Drive platform reliability and operational excellence through SLOs, observability, incident response, postmortems, production readiness reviews, automation, and capacity planning.
Hire, mentor, and grow a multidisciplinary engineering organization across AI infrastructure, platform services, MLOps, agentic systems, developer experience, and production operations.