The engineer will work with senior engineers and researchers on AI training and inference systems, with a strong focus on LLM execution engines, data and KV‑cache management, and multi‑tier memory hierarchies across modern data‑center platforms. The role centers on end‑to‑end performance characterization and optimization of large‑scale AI workloads, spanning single‑node GPUs to rack‑scale inference deployments. Responsibilities include systems software development, workload engineering, performance analysis, and memory‑centric optimization for LLM training, serving, and agentic AI frameworks. The work emphasizes real customer inference and training workloads, emerging memory technologies (HBM, LP/DRAM, CXL, NVMe, remote memory fabrics), and the economics and token‑level efficiency of large‑scale inference systems. This role combines hands‑on engineering with applied systems research, directly influencing next‑generation AI platforms and memory‑driven system architectures.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level