Software Developer/Engineer (Mid Level experience)

Tri-Force Consulting Services, Inc.•Philadelphia, PA

22h•Hybrid

About The Position

Consultant Requirements – On-Prem LLM & Vector DB Implementation Core Experience: Hands-on experience deploying open-source LLMs such as Meta Llama 3 and Mistral / Mixtral in on-prem or private environments Strong proficiency in Python for LLM inference, prompt engineering, and integration Experience with CPU-based inference, model quantization, and performance tuning Vector Databases & RAG: Practical experience with open-source vector databases such as Qdrant, Chroma, Milvus, or pgvector Proven implementation of Retrieval-Augmented Generation (RAG) pipelines Experience generating and managing embeddings and metadata filtering Security & Governance: Understanding of data privacy, air-gapped deployments, and enterprise security requirements Experience implementing access controls and audit logging Nice to Have: Experience with LangChain or LlamaIndex Exposure to Rust, Go, or C++ for high-performance services Familiarity with Docker and Kubernetes for on-prem deployments Knowledge of inference frameworks (e.g., vLLM, llama.cpp, Hugging Face Transformers) Prior work in regulated or enterprise environments Deliverables: Reference architecture and deployment guidance Working prototype (LLM + vector DB + RAG) Documentation and knowledge transfer to internal teams If you are: bright, motivated, skilled, a difference-maker, able to get things done, work with minimum direction, enthusiastic, a thinker, able to juggle and multi-task, communicate effectively, and lead, then we would like to hear from you. We need exceptionally capable people for this role for our client, so get back to us and tell us why you think you are a fit.

Requirements

Hands-on experience deploying open-source LLMs such as Meta Llama 3 and Mistral / Mixtral in on-prem or private environments
Strong proficiency in Python for LLM inference, prompt engineering, and integration
Experience with CPU-based inference, model quantization, and performance tuning
Practical experience with open-source vector databases such as Qdrant, Chroma, Milvus, or pgvector
Proven implementation of Retrieval-Augmented Generation (RAG) pipelines
Experience generating and managing embeddings and metadata filtering
Understanding of data privacy, air-gapped deployments, and enterprise security requirements
Experience implementing access controls and audit logging

Nice To Haves

Experience with LangChain or LlamaIndex
Exposure to Rust, Go, or C++ for high-performance services
Familiarity with Docker and Kubernetes for on-prem deployments
Knowledge of inference frameworks (e.g., vLLM, llama.cpp, Hugging Face Transformers)
Prior work in regulated or enterprise environments