Skip to content

RAGforge

Local-first RAG pipeline — PDF & Markdown ingestion, Qdrant retrieval, bge reranking, and an answer-quality eval harness. Pairs with turboquant-ml for quantized LLM serving.

What it is

A small, readable RAG library that you can run end-to-end on your own laptop with open-source models, no API key required — and that ships with an evaluation harness so you can actually measure whether your changes improve answer quality.

Three opinions:

  1. Local-first. BAAI/bge-small for embeddings, BAAI/bge-reranker-base for reranking, Qdrant embedded (no server), any HuggingFace causal LM for generation.
  2. Measurable. rf eval computes context_recall, answer_relevance and faithfulness in pure Python.
  3. Composable, not framework-y. Each stage is one short module behind a small interface. Swap any of them.

Install

pip install ragforge-ml                       # core
pip install "ragforge-ml[serve]"              # + FastAPI
pip install "ragforge-ml[quantized]"          # + turboquant-ml NF4 LLM path
pip install "ragforge-ml[all]"                # everything

60-second tour

from ragforge import Pipeline

rag = Pipeline.from_defaults(model_id="Qwen/Qwen2.5-3B-Instruct")
rag.ingest(["data/sample/company-policy.md"])

answer = rag.ask("How long is the refund window?")
print(answer.text)
for src in answer.sources:
    print(f"  {src.score:.3f}  {src.metadata['path']}")

CLI

rf ingest data/sample/ --collection demo
rf ask "How long is the refund window?" --collection demo
rf eval data/sample/qa.jsonl --collection demo
rf serve --collection demo

See Ingestion, Retrieval, LLM serving and Evaluation for the details.