ML Inference Router Engineer

eBay Inc.•Austin, TX

56d•Hybrid

About The Position

At eBay, we're more than a global ecommerce leader - we're changing the way the world shops and sells. Our platform empowers millions of buyers and sellers in more than 190 markets around the world. We're committed to pushing boundaries and leaving our mark as we reinvent the future of ecommerce for enthusiasts. Our customers are our compass, authenticity thrives, bold ideas are welcome, and everyone can bring their unique selves to work - every day. We're in this together, sustaining the future of our customers, our company, and our planet. Join a team of passionate thinkers, innovators, and dreamers - and help us connect people and build communities to create economic opportunity for all. About the team and the role: eBay's AI Platform team is building the next generation of agentic and inference technologies that power AI experiences for hundreds of millions of users worldwide. We are seeking an ML Interence Router Engineer to design and build a highly scalable, low-latency inference gateway capable of supporting billions of daily requests. This role sits at the core of eBay's AI infrastructure-developing distributed, fault-tolerant systems that orchestrate requests across diverse large language models (LLMs) and ensure high reliability, efficiency, and cost-effectiveness. If you are passionate about large-scale systems engineering, love solving hard performance problems, and want to shape the backbone of AI at global scale, we'd love to hear from you.

Requirements

10+ years of experience building large-scale, fault-tolerant, high-performance distributed systems.
Strong programming skills in one or more of Java, Go, Rust, or C++ (Java preferred for gateway services).
Deep understanding of networking, concurrency, memory management, and performance tuning in production systems.
Proven experience designing and operating low-latency APIs at very large scale (10M+ QPS).
Hands-on experience with Kubernetes, service meshes, and container orchestration at scale.
Strong background in cloud infrastructure (AWS, GCP, Azure) and distributed system design.

Nice To Haves

Experience with inference serving frameworks (vLLM, Triton, TensorRT-LLM, FasterTransformer, DeepSpeed-MII, or similar).
Familiarity with LLM tokenization, batching, and scheduling strategies.
Background in microservice API gateway design (rate limiting, routing policies, authentication).
Experience with real-time monitoring, tracing, and autoscaling of high-throughput systems.
Contributions to open-source distributed systems or ML serving projects.

Responsibilities

Design and build an LLM inference gateway that scales to billions of daily requests with millisecond-level latency.
Develop intelligent request routing, load balancing, and fallback mechanisms across heterogeneous LLM backends (internal and external).
Optimize throughput, cost, and reliability of inference workloads in multi-tenant environments.
Collaborate with platform, research, and product teams to integrate new models and agentic capabilities into the gateway.
Implement observability, tracing, and autoscaling for inference traffic across Kubernetes-based clusters.
Conduct design and code reviews to ensure high standards in distributed systems architecture.
Stay current with advances in LLM serving, inference acceleration, and model APIs to continuously evolve the platform.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume