Retrieval & reranking¶
The retrieval path is two stages:
- Bi-encoder retrieval with a sentence-transformers model — fast, ANN search via Qdrant returns the top ~20-50 candidates.
- Cross-encoder reranking with a BGE reranker — slower but much more accurate, narrows the candidates down to the final top 3-5 shown to the LLM.
Encoder¶
Default: BAAI/bge-small-en-v1.5 (33 M params, 384-dim). It is the smallest
model that still sits in the top tier of MTEB retrieval scores and runs
~3 ms / chunk on CPU. For multilingual corpora swap to
intfloat/multilingual-e5-small.
from ragforge.embed import SentenceTransformerEncoder
enc = SentenceTransformerEncoder("BAAI/bge-small-en-v1.5", device="cpu")
vecs = enc.encode(["hello world"])
Vector store¶
Two backends ship in the box; both implement the same three methods
(upsert, search, count).
Qdrant (embedded — recommended)¶
No server, no Docker. The client uses a local on-disk index.
from ragforge.vectorstore import QdrantStore
store = QdrantStore(collection="demo", dim=enc.dim, path="qdrant_storage")
Qdrant (remote)¶
For multi-process serving or shared indexes, point at a running Qdrant instance:
NumPy (in-memory)¶
For unit tests and tiny corpora (<10k vectors).
Reranker¶
The cross-encoder scores (query, candidate) pairs jointly — much higher
nDCG than the bi-encoder alone, especially on multi-hop questions.
from ragforge.rerank import BGEReranker
reranker = BGEReranker("BAAI/bge-reranker-base", device="cpu")
top5 = reranker.rerank("How long is the refund window?", hits, top_k=5)
Skipping the reranker is also a valid choice when latency is paramount: pass
use_reranker=False to Pipeline.from_defaults.
Tuning recipe¶
| Symptom | First thing to try |
|---|---|
| Top hits are mostly irrelevant | Increase top_k_retrieve, keep the reranker |
| Top hits are correct but answer wrong | Check chunk size — too small loses context, too big confuses the LLM |
| Latency too high | Drop the reranker; or rerank fewer candidates |
| Bad on multilingual queries | Swap the encoder to intfloat/multilingual-e5-small |
| Doc-IDs duplicated | Make sure you didn't break the deterministic chunk-id (see pipeline._chunk_id) |