RAGforge¶
Local-first RAG pipeline — PDF & Markdown ingestion, Qdrant retrieval, bge reranking, and an answer-quality eval harness. Pairs with turboquant-ml for quantized LLM serving.
What it is¶
A small, readable RAG library that you can run end-to-end on your own laptop with open-source models, no API key required — and that ships with an evaluation harness so you can actually measure whether your changes improve answer quality.
Three opinions:
- Local-first. BAAI/bge-small for embeddings, BAAI/bge-reranker-base for reranking, Qdrant embedded (no server), any HuggingFace causal LM for generation.
- Measurable.
rf evalcomputescontext_recall,answer_relevanceandfaithfulnessin pure Python. - Composable, not framework-y. Each stage is one short module behind a small interface. Swap any of them.
Install¶
pip install ragforge-ml # core
pip install "ragforge-ml[serve]" # + FastAPI
pip install "ragforge-ml[quantized]" # + turboquant-ml NF4 LLM path
pip install "ragforge-ml[all]" # everything
60-second tour¶
from ragforge import Pipeline
rag = Pipeline.from_defaults(model_id="Qwen/Qwen2.5-3B-Instruct")
rag.ingest(["data/sample/company-policy.md"])
answer = rag.ask("How long is the refund window?")
print(answer.text)
for src in answer.sources:
print(f" {src.score:.3f} {src.metadata['path']}")
CLI¶
rf ingest data/sample/ --collection demo
rf ask "How long is the refund window?" --collection demo
rf eval data/sample/qa.jsonl --collection demo
rf serve --collection demo
See Ingestion, Retrieval, LLM serving and Evaluation for the details.