v0.13.3 · MIT · Python 3.11+
Self-hosted documentation search for AI agents.
gnosis-mcp indexes your docs, git history, and crawled sites into a searchable knowledge base exposed over MCP. Zero config. SQLite by default. Hybrid FTS5 + vector with optional cross-encoder reranking.
SQLite keyword (FTS5 + BM25), in-memory, median of 3 runs on laptop CPU. A typical agent lookup returns ~300–800 tokens of on-point snippets versus 3,000–15,000 tokens to read the full file — roughly 5–10× savings when your corpus covers the question, plus fewer hallucinations because the agent reads verbatim rather than guessing. Don't know what these mean? How we measure search quality — written for non-experts.
— What it does
Your docs, indexed
Markdown, text, notebooks, TOML, CSV, JSON. Optional rST + PDF. Heading-aware chunking that never splits inside code blocks or tables.
Hybrid search
BM25 + local ONNX embeddings merged via Reciprocal Rank Fusion. Tune the fusion constant
with GNOSIS_MCP_RRF_K. No API key required.
Cross-encoder rerank
Optional [reranking] extra — a 22 M-param ONNX cross-encoder.
Off by default; test on your corpus first — the bundled MS-MARCO model can hurt dev-doc retrieval.
Git history
Ingest commit messages as searchable context. Find the reason a line exists, not just the line itself.
Web crawl
Sitemap discovery or BFS. Robots.txt with same-host redirect guard. ETag/Last-Modified caching. Trafilatura extraction with per-page timeout.
MCP + REST on one port
9 MCP tools + 3 resources. Optional REST API on the same process with Bearer auth (timing-safe). File watcher for auto re-ingest.
— vs. alternatives
| gnosis-mcp | Context7 | docs-mcp-server | mcp-local-rag | |
|---|---|---|---|---|
| Your data stays on your machine | ● | — | ● | ● |
| Index private docs (not just public) | ● | — | ● | ● |
| No account / API key required | ● | — | ● | ● |
| Ingest docs + git history + web crawl | ● | crawl only | crawl | — |
| Git commits as searchable context | ● | — | — | — |
| Keyword + vector + optional rerank | ● | opaque | opt | partial |
| SQLite default · PostgreSQL at scale | ● | — | — | — |
| MCP + REST on one port | ● | MCP only | MCP only | MCP only |
| Reproducible eval harness | gnosis-mcp eval | — | — | — |
| Open source · reproducible benchmarks | MIT | proprietary | MIT | MIT |
Context7 is a hosted shortcut — they pre-crawl popular library
docs so you query a SaaS without setup. Convenient, but your queries leave
your machine and you can't add private docs. gnosis-mcp ships a crawler — gnosis-mcp crawl https://docs.stripe.com indexes the same vendor docs into your local SQLite, alongside your own
private docs and git history. One query, one index, all yours.
— Measured numbers
Fast enough for agents.
8.7 ms mean, 13 ms p95 per MCP tool call. Your agent can call gnosis-mcp twenty times in one response and add under a quarter-second.
Scales past the laptop.
471 QPS and 6 ms p95 on a 10,000-doc corpus — SQLite, in-memory, no infrastructure. The scale curve crosses 1 ms p95 at around 1,000 docs.
Finds the right answer.
Hit@5 = 0.92, nDCG@10 = 0.87 on a real 558-doc dev-docs corpus after v0.11 tuning. 0.671 on BEIR SciFact — within 1 % of the Lucene BM25 baseline.
You can reproduce it.
gnosis-mcp eval runs the retrieval-quality harness in about a second. Full methodology + QPS scale curve →
— What changed in v0.11
| Metric (real dev-docs corpus) | v0.10 | v0.11 | Δ |
|---|---|---|---|
| nDCG@10 | 0.8407 | 0.8702 | +0.0295 |
| MRR | 0.7813 | 0.7933 | +0.0120 |
| Hit@5 | 0.9200 | 0.9200 | — |
| p95 latency | 7 ms | 7 ms | — |
One config change: GNOSIS_MCP_CHUNK_SIZE lowered
from 4000 → 2000 characters. Peak of a 7-point sweep on a real
558-doc corpus. The full write-up, including the reranker trap
(three cross-encoder families tested, all hurt) and the "hybrid ≡
keyword when vocabulary matches" finding, is in bench-experiments.
— Get it running
Three install paths — one-command Claude Code plugin, manual copy-paste, or MCP-server-only. Full step-by-step guide: Installation Guide →
pip install gnosis-mcp gnosis-mcp ingest ./docs && gnosis-mcp serve pip install gnosis-mcp[embeddings] gnosis-mcp ingest ./docs --embed pip install gnosis-mcp[reranking] GNOSIS_MCP_RERANK_ENABLED=true gnosis-mcp serve pip install gnosis-mcp[postgres] export GNOSIS_MCP_DATABASE_URL=postgresql://...
{
"mcpServers": {
"gnosis": {
"command": "gnosis-mcp",
"args": ["serve"]
}
}
} Editor snippets for Cursor, Windsurf, VS Code, JetBrains, Cline — view on GitHub.
— Frequently asked
How do I save tokens when my AI coding agent reads documentation?
Point your agent at a gnosis-mcp server instead of pasting full doc files into context. gnosis-mcp does hybrid search (keyword + semantic) against your local index and returns only the relevant chunks — typically 200-500 tokens instead of 5,000-20,000 for a full file. The more docs you have, the larger the savings.
How do I stop my AI coding agent from hallucinating my own API?
Ground it in your real documentation via the Model Context Protocol. gnosis-mcp indexes your actual codebase, markdown docs, and git history, then exposes them as searchable tools to Claude Code, Cursor, Windsurf, and VS Code. The agent calls search_docs and gets verbatim excerpts with file paths — there is no gap for it to fill with a guess.
How do I give an AI agent access to my private documentation without sending it to OpenAI?
Run gnosis-mcp locally. It's a single Python process with SQLite storage — no cloud uploads, no vector database provider, no API keys. Your documents never leave your machine. The MCP protocol is what talks to the agent, and it runs over stdio or a local HTTP port you control.
What is the difference between MCP and traditional RAG?
Traditional RAG is invisible to the user — the application chunks, embeds, and retrieves before the LLM sees anything. MCP flips this: the LLM itself decides when to search, what query to use, and how many results to fetch. It's agent-controlled retrieval. gnosis-mcp implements the MCP spec so agents like Claude Code can orchestrate searches on your private data.
How do I reduce LLM context window usage for large codebases?
Index the codebase once with `gnosis-mcp ingest ./`, then let your agent call search_docs(query="...") per question. Instead of pasting 50 files into the context window every turn, the agent retrieves the 3-5 chunks that actually matter. For a 100k-token codebase this typically cuts per-turn context to 2-5%.
How do I search git commit history from an AI agent?
Run `gnosis-mcp ingest-git /path/to/repo` to index commit messages, authors, and diff summaries as first-class documents. Agents can then search "why did we remove X" or "commits that touch payment.py" and get real answers from git history — not guesses.
Do I need a vector database or an embedding API key to use gnosis-mcp?
No. gnosis-mcp runs local ONNX embedding models (quantized, CPU-friendly) and stores them in SQLite alongside the FTS5 full-text index. Zero cloud dependencies. Optional: swap to PostgreSQL + pgvector for multi-user or large-scale deployments.
Can I use gnosis-mcp with Claude Code, Cursor, or Windsurf?
Yes — all three. gnosis-mcp is a standard MCP server. For Claude Code: add to ~/.claude/settings.json under mcpServers. For Cursor: add via Settings → MCP. For Windsurf: add in mcp_config.json. See https://gnosismcp.com/doc/docs/overview for copy-paste configs per IDE.
What embedder does gnosis-mcp use and how good is search quality?
gnosis-mcp bundles MongoDB/mdbr-leaf-ir — ranked #1 on the MTEB/BEIR leaderboard for models ≤ 100 M params. Our keyword path lands within 1 % of the Lucene BM25 baseline on SciFact (0.671 vs 0.679 nDCG@10). 632 tests, every release.