ML Engineer / AI Platform Lead

BTSE

55d

About The Position

This role involves owning the AI core, including model serving, the retrieval-augmented generation (RAG) pipeline, prompt engineering, and the feedback-to-training pipeline. In Phase 1, the focus is on optimizing the base model's performance through context engineering, such as system prompts, few-shot exemplars, and retrieval optimization, without altering model weights. Additionally, the role requires designing a custom model training workflow for enterprise clients to train their own fine-tuned models in Phase 2. This is described as the highest-leverage individual contributor role on the founding team.

Requirements

5+ years ML engineering; 2+ years working with large language models in production.
Hands-on experience with LLM serving frameworks (vLLM, TGI, or equivalent).
Deep experience building RAG pipelines: chunking strategies, embedding models, vector databases, reranking.
Strong prompt engineering skills for production applications â€” you know how to make a base model produce consistent, structured, high-quality output.
Python: PyTorch, Transformers, FastAPI.
Familiar with LoRA/QLoRA fine-tuning workflows.

Nice To Haves

Experience building multi-tenant ML serving infrastructure.
Experience with financial or crypto AI applications.
Experience with cross-encoder reranking models (DeBERTa or similar).
Understanding of data isolation requirements for ML training pipelines.

Responsibilities

Deploy and optimise a large language model for production inference: quantisation, continuous batching, low-latency serving.
Build the RAG pipeline: document chunking, embedding generation, vector storage, cross-encoder reranking, and context assembly optimised for a 128K-token context window.
Build the context layer: per-tenant system prompts, dynamically retrieved few-shot exemplars, task routing (classifying incoming requests to the right prompt configuration).
Build defensive output parsing: structured JSON output from an unmodified base model with graceful fallbacks.
Design and implement the feedback collection pipeline: capturing user corrections and ratings, automatically generating training data candidates for future fine-tuning.
Design the custom model training workflow: tenant-scoped LoRA training on client-specific data, model evaluation, A/B testing, and isolated deployment.
Monitor and improve inference quality: parsing failure rates, citation accuracy, hallucination rates, latency â€” all tracked per tenant.
Iterate on prompts daily with the domain expert during the pilot phase.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume