Member of Technical Staff - Inference Systems

Liquid AI•Cambridge, MA

1d•Hybrid

About The Position

Spun out of MIT CSAIL, we build general-purpose AI systems that run efficiently across deployment targets, from data center accelerators to on-device hardware, ensuring low latency, minimal memory usage, privacy, and reliability. We partner with enterprises across consumer electronics, automotive, life sciences, and financial services. We are scaling rapidly and need exceptional people to help us get there. Our inference stack is central to everything we ship. You'll be a core part of the team responsible for the engine layer that runs our models in production and in partner environments, and for the benchmarking infrastructure we use to evaluate our own work and verify what partners bring to us. Day to day, that means working closely with research and product, but also directly with external engineering teams. We need someone who: Can pick up unfamiliar tools quickly and knows how to assess whether they're worth using. Designs AI benchmarks and holds methodology to a high standard. Cares about inference details, understands the tradeoffs, and checks what changed across the board before calling something done. Doesn’t consider a model port finished until you can prove the outputs are correct.

Requirements

Hands-on experience with at least one inference framework like llama.cpp, ONNX Runtime, or MLX, going beyond basic usage into internals and modification.
Experience designing and building benchmarking pipelines, including methodology, validation, and reproducibility.
Strong C++ and Python in performance-sensitive contexts.
Solid understanding of inference fundamentals: quantization, decoding strategies, memory layout, and how they interact.

Nice To Haves

Experience porting models across runtimes and verifying numerical correctness.
Prior work with external partners or clients in a technical validation or evaluation capacity.
Familiarity with edge inference targets and the constraints that come with them.

Responsibilities

Design and build benchmark suites that cover inference performance, model quality, and knowledge evaluation across different hardware targets.
Run external partner verifications: evaluate their solutions against our benchmarks, identify gaps, and clearly deliver findings.
Port models like LFM2 onto different runtimes and frameworks, and verify correctness end-to-end.
Maintain and extend the inference engine layer built on llama.cpp, ONNX, and MLX as new model architectures emerge from research.
Make benchmark results explainable and verifiable, so internal teams and partners can trust and reproduce them independently.

Benefits

Competitive base salary with equity in a unicorn-stage company
We pay 100% of medical, dental, and vision premiums for employees and dependents
401(k) matching up to 4% of base pay
Unlimited PTO plus company-wide Refill Days throughout the year

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume