We are seeking a highly technical Inference Engine Engineer to optimize the performance and efficiency of our core inference engine. In this role, you will focus on designing, implementing, and optimizing GPU kernels and supporting infrastructure for next-generation generative and agentic AI workloads. Your work will directly power the most latency-critical and compute-intensive systems deployed by our customers. We are looking for an exceptional engineer with a strong foundation in GPU programming and compiler infrastructure. The ideal candidate enjoys pushing performance boundaries and has experience supporting production-scale machine learning applications.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
1-10 employees