The Inference team is responsible for delivering high-performance model serving capabilities that meet the needs of real production workloads. We work at the intersection of model behavior, serving systems, hardware, and customer requirements to improve throughput, latency, reliability, and quality across our inference stack. We are looking for an Applied AI Engineer to help us understand, measure, and improve the real-world performance of our inference platform. In the near term, this role will focus on building and running rigorous benchmarks, profiling model and system behavior, identifying bottlenecks, and driving targeted optimizations for both platform-wide and customer-specific workloads. This role is intentionally scoped around applied performance work in support of the Inference organization. Initial responsibilities center on benchmarking, optimization, and workload-driven research rather than broad ownership of frontier model research agendas. Over time, the scope of the role is expected to broaden as the team and product mature.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed