Tune
Tune
Every RAG system has a distribution where it breaks. The only way to know whether gnosis-mcp's defaults are right for your corpus is to measure them against your queries. This skill runs that measurement.
Usage
/gnosis:tune # Quick sweep (5 chunk sizes, keyword mode)
/gnosis:tune full # Extended sweep (keyword + hybrid + rerank on/off)
/gnosis:tune --golden ./q.jsonl # Use a specific golden-query file
Mode: $ARGUMENTS
Prerequisites
You need:
A corpus — anywhere on disk. Will be passed to
gnosis-mcp ingest.A golden-query file — one JSON object per line:
{"query": "how does our auth work", "expected_paths": ["docs/auth", "architecture/auth"]} {"query": "stripe webhook failure runbook", "expected_paths": ["runbooks/stripe"]}expected_pathsuses substring match against the returnedfile_path, case-insensitive. Generous enough that you don't need exact path memorization.
20 hand-written queries is enough to get signal. 50 is plenty.
Quick sweep (default)
Runs gnosis-mcp ingest 5 times with different chunk sizes, scoring
each with your golden set. Keyword mode only (fast, no embedding cost).
# Default path if none given: ./docs + ./golden.jsonl
CORPUS=${CORPUS:-./docs}
GOLDEN=${GOLDEN:-./golden.jsonl}
for size in 1000 1500 2000 2500 3000; do
uv run --with 'gnosis-mcp[embeddings] @ gnosis-mcp' \
python tests/bench/bench_real_corpus.py \
--corpus "$CORPUS" --golden "$GOLDEN" \
--modes keyword --chunk-size $size \
--out bench-results/tune-chunk${size}.json
done
(If gnosis-mcp was installed via pip rather than from source, invoke
bench_real_corpus.py from the repo you cloned, and drop the
uv run --with '... @ .' prefix.)
Expected runtime: 5–10 minutes per size on a typical laptop (ingest is the dominant cost; scoring itself is sub-second).
Output table:
chunk chars │ nDCG@10 │ MRR │ Hit@5 │ p95 │ ingest
────────────┼─────────┼────────┼────────┼──────────┼────────
1000 │ 0.8557 │ 0.8067 │ 0.92 │ 30 ms │ 592 s
1500 │ 0.8529 │ 0.7967 │ 0.92 │ 7 ms │ 234 s
2000 │ 0.8702 │ 0.7933 │ 0.92 │ 7 ms │ 210 s ← peak
2500 │ 0.8602 │ 0.7880 │ 0.92 │ 7 ms │ 195 s
3000 │ 0.8459 │ 0.7880 │ 0.92 │ 7 ms │ 182 s
Pick the chunk size with the best nDCG@10 that doesn't blow your ingest-time budget. Persist it:
# In your shell profile / .env
export GNOSIS_MCP_CHUNK_SIZE=2000
If multiple sizes tie at the top (plateau), pick the larger one — same quality, fewer chunks, faster ingest.
Full sweep (full)
Adds hybrid mode and optional reranker A/B — ~30-40 minutes depending
on corpus size and whether [reranking] is installed.
for size in 1500 2000 3000; do
for mode in keyword hybrid hybrid+rerank; do
uv run --with 'gnosis-mcp[embeddings,reranking] @ gnosis-mcp' \
python tests/bench/bench_real_corpus.py \
--corpus "$CORPUS" --golden "$GOLDEN" \
--modes "$mode" --chunk-size $size \
--out "bench-results/tune-${mode//+/_}-${size}.json"
done
done
What to read from the output
nDCG delta between keyword and hybrid: if < 0.02, your corpus is vocabulary-matched (queries share exact terms with docs). Dense retrieval adds latency without lift — leave hybrid off.
nDCG delta when reranker is added: on dev docs specifically, we measure -27 nDCG (!). If you see a similar drop, your corpus has the same distribution problem — disable
GNOSIS_MCP_RERANK_ENABLED. If it's +3-10 on your corpus, keep it on.Chunk-size peak location: tells you the typical topic-coherent block length in your corpus. API-reference-heavy corpora peak lower (1000-1500 chars). Long-form prose peaks higher (2500-3000).
Programmatic golden-set from existing access logs
If you've been using gnosis-mcp for a while, real queries are already logged. Generate a seed golden file from them:
sqlite3 ~/.local/share/gnosis-mcp/docs.db \
"SELECT DISTINCT query FROM search_access_log
WHERE timestamp > date('now', '-30 days')
ORDER BY count(*) DESC LIMIT 30;" \
| awk '{print "{\"query\":\""$0"\", \"expected_paths\": []}"}' > golden-seed.jsonl
Then fill in expected_paths by hand — 5 minutes of work, gives you a
golden set grounded in how you actually use the system.
Automate the winner
After picking a chunk size and reranker/hybrid choice, persist it so
every gnosis-mcp ingest (manual or via --watch) uses the tuned
settings:
# Persistent across processes
export GNOSIS_MCP_CHUNK_SIZE=2000
export GNOSIS_MCP_RERANK_ENABLED=false # if tune showed it hurts
# Re-ingest with the new config
gnosis-mcp ingest ./docs --embed --wipe
When to re-tune
- Corpus grows 2× or more
- You add a fundamentally different content type (e.g., API references added to a prose-only corpus, or vice versa)
- You switch embedder (
GNOSIS_MCP_EMBED_MODEL) — the semantic side of hybrid now behaves differently - After a new release of gnosis-mcp that touched chunking or fusion
Otherwise: tune once per year. It's not free but it's not frequent.
See also
- bench-experiments-2026-04-18 — our own tune results with the methodology that informed these defaults
- how-we-measure-search — what these metrics mean, in plain English
/gnosis:ingest— re-ingest after you've picked the winner