Software Developer/Engineer (Mid Level experience)

Tri-Force Consulting Services, Inc.Philadelphia, PA
22hHybrid

About The Position

Consultant Requirements – On-Prem LLM & Vector DB Implementation Core Experience: Hands-on experience deploying open-source LLMs such as Meta Llama 3 and Mistral / Mixtral in on-prem or private environments Strong proficiency in Python for LLM inference, prompt engineering, and integration Experience with CPU-based inference, model quantization, and performance tuning Vector Databases & RAG: Practical experience with open-source vector databases such as Qdrant, Chroma, Milvus, or pgvector Proven implementation of Retrieval-Augmented Generation (RAG) pipelines Experience generating and managing embeddings and metadata filtering Security & Governance: Understanding of data privacy, air-gapped deployments, and enterprise security requirements Experience implementing access controls and audit logging Nice to Have: Experience with LangChain or LlamaIndex Exposure to Rust, Go, or C++ for high-performance services Familiarity with Docker and Kubernetes for on-prem deployments Knowledge of inference frameworks (e.g., vLLM, llama.cpp, Hugging Face Transformers) Prior work in regulated or enterprise environments Deliverables: Reference architecture and deployment guidance Working prototype (LLM + vector DB + RAG) Documentation and knowledge transfer to internal teams If you are: bright, motivated, skilled, a difference-maker, able to get things done, work with minimum direction, enthusiastic, a thinker, able to juggle and multi-task, communicate effectively, and lead, then we would like to hear from you. We need exceptionally capable people for this role for our client, so get back to us and tell us why you think you are a fit.

Requirements

  • Hands-on experience deploying open-source LLMs such as Meta Llama 3 and Mistral / Mixtral in on-prem or private environments
  • Strong proficiency in Python for LLM inference, prompt engineering, and integration
  • Experience with CPU-based inference, model quantization, and performance tuning
  • Practical experience with open-source vector databases such as Qdrant, Chroma, Milvus, or pgvector
  • Proven implementation of Retrieval-Augmented Generation (RAG) pipelines
  • Experience generating and managing embeddings and metadata filtering
  • Understanding of data privacy, air-gapped deployments, and enterprise security requirements
  • Experience implementing access controls and audit logging

Nice To Haves

  • Experience with LangChain or LlamaIndex
  • Exposure to Rust, Go, or C++ for high-performance services
  • Familiarity with Docker and Kubernetes for on-prem deployments
  • Knowledge of inference frameworks (e.g., vLLM, llama.cpp, Hugging Face Transformers)
  • Prior work in regulated or enterprise environments

Responsibilities

  • Reference architecture and deployment guidance
  • Working prototype (LLM + vector DB + RAG)
  • Documentation and knowledge transfer to internal teams
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service