v0.13.3 · MIT · Python 3.11+

Self-hosted documentation search for AI agents.

gnosis-mcp indexes your docs, git history, and crawled sites into a searchable knowledge base exposed over MCP. Zero config. SQLite by default. Hybrid FTS5 + vector with optional cross-encoder reranking.

$ pip install gnosis-mcp
Live graph → Install for your editor →
0.92
Hit@5 on real dev docs
0.870
nDCG@10 (real corpus, tuned)
0.671
nDCG@10 on BEIR SciFact
9,463
QPS, keyword search

SQLite keyword (FTS5 + BM25), in-memory, median of 3 runs on laptop CPU. A typical agent lookup returns ~300–800 tokens of on-point snippets versus 3,000–15,000 tokens to read the full file — roughly 5–10× savings when your corpus covers the question, plus fewer hallucinations because the agent reads verbatim rather than guessing. Don't know what these mean? How we measure search quality — written for non-experts.

— What it does

Your docs, indexed

Markdown, text, notebooks, TOML, CSV, JSON. Optional rST + PDF. Heading-aware chunking that never splits inside code blocks or tables.

Hybrid search

BM25 + local ONNX embeddings merged via Reciprocal Rank Fusion. Tune the fusion constant with GNOSIS_MCP_RRF_K. No API key required.

Cross-encoder rerank

Optional [reranking] extra — a 22 M-param ONNX cross-encoder. Off by default; test on your corpus first — the bundled MS-MARCO model can hurt dev-doc retrieval.

Git history

Ingest commit messages as searchable context. Find the reason a line exists, not just the line itself.

Web crawl

Sitemap discovery or BFS. Robots.txt with same-host redirect guard. ETag/Last-Modified caching. Trafilatura extraction with per-page timeout.

MCP + REST on one port

9 MCP tools + 3 resources. Optional REST API on the same process with Bearer auth (timing-safe). File watcher for auto re-ingest.

— vs. alternatives

 gnosis-mcpContext7docs-mcp-servermcp-local-rag
Your data stays on your machine
Index private docs (not just public)
No account / API key required
Ingest docs + git history + web crawlcrawl onlycrawl
Git commits as searchable context
Keyword + vector + optional rerankopaqueoptpartial
SQLite default · PostgreSQL at scale
MCP + REST on one portMCP onlyMCP onlyMCP only
Reproducible eval harnessgnosis-mcp eval
Open source · reproducible benchmarksMITproprietaryMITMIT

Context7 is a hosted shortcut — they pre-crawl popular library docs so you query a SaaS without setup. Convenient, but your queries leave your machine and you can't add private docs. gnosis-mcp ships a crawler — gnosis-mcp crawl https://docs.stripe.com indexes the same vendor docs into your local SQLite, alongside your own private docs and git history. One query, one index, all yours.

— Measured numbers

Fast enough for agents.

8.7 ms mean, 13 ms p95 per MCP tool call. Your agent can call gnosis-mcp twenty times in one response and add under a quarter-second.

Scales past the laptop.

471 QPS and 6 ms p95 on a 10,000-doc corpus — SQLite, in-memory, no infrastructure. The scale curve crosses 1 ms p95 at around 1,000 docs.

Finds the right answer.

Hit@5 = 0.92, nDCG@10 = 0.87 on a real 558-doc dev-docs corpus after v0.11 tuning. 0.671 on BEIR SciFact — within 1 % of the Lucene BM25 baseline.

You can reproduce it.

gnosis-mcp eval runs the retrieval-quality harness in about a second. Full methodology + QPS scale curve →

— What changed in v0.11

Metric (real dev-docs corpus)v0.10v0.11Δ
nDCG@100.84070.8702+0.0295
MRR0.78130.7933+0.0120
Hit@50.92000.9200
p95 latency7 ms7 ms

One config change: GNOSIS_MCP_CHUNK_SIZE lowered from 4000 → 2000 characters. Peak of a 7-point sweep on a real 558-doc corpus. The full write-up, including the reranker trap (three cross-encoder families tested, all hurt) and the "hybrid ≡ keyword when vocabulary matches" finding, is in bench-experiments.

— Get it running

Three install paths — one-command Claude Code plugin, manual copy-paste, or MCP-server-only. Full step-by-step guide: Installation Guide →

Install
$ pip install gnosis-mcp
$ gnosis-mcp ingest ./docs && gnosis-mcp serve
Semantic search (local, no API key)
$ pip install gnosis-mcp[embeddings]
$ gnosis-mcp ingest ./docs --embed
Cross-encoder rerank (opt-in)
$ pip install gnosis-mcp[reranking]
$ GNOSIS_MCP_RERANK_ENABLED=true gnosis-mcp serve
Scale to PostgreSQL
$ pip install gnosis-mcp[postgres]
$ export GNOSIS_MCP_DATABASE_URL=postgresql://...
Wire it into Claude Code / Cursor / Windsurf
{
  "mcpServers": {
    "gnosis": {
      "command": "gnosis-mcp",
      "args": ["serve"]
    }
  }
}

Editor snippets for Cursor, Windsurf, VS Code, JetBrains, Cline — view on GitHub.

— Frequently asked

How do I save tokens when my AI coding agent reads documentation?

Point your agent at a gnosis-mcp server instead of pasting full doc files into context. gnosis-mcp does hybrid search (keyword + semantic) against your local index and returns only the relevant chunks — typically 200-500 tokens instead of 5,000-20,000 for a full file. The more docs you have, the larger the savings.

How do I stop my AI coding agent from hallucinating my own API?

Ground it in your real documentation via the Model Context Protocol. gnosis-mcp indexes your actual codebase, markdown docs, and git history, then exposes them as searchable tools to Claude Code, Cursor, Windsurf, and VS Code. The agent calls search_docs and gets verbatim excerpts with file paths — there is no gap for it to fill with a guess.

How do I give an AI agent access to my private documentation without sending it to OpenAI?

Run gnosis-mcp locally. It's a single Python process with SQLite storage — no cloud uploads, no vector database provider, no API keys. Your documents never leave your machine. The MCP protocol is what talks to the agent, and it runs over stdio or a local HTTP port you control.

What is the difference between MCP and traditional RAG?

Traditional RAG is invisible to the user — the application chunks, embeds, and retrieves before the LLM sees anything. MCP flips this: the LLM itself decides when to search, what query to use, and how many results to fetch. It's agent-controlled retrieval. gnosis-mcp implements the MCP spec so agents like Claude Code can orchestrate searches on your private data.

How do I reduce LLM context window usage for large codebases?

Index the codebase once with `gnosis-mcp ingest ./`, then let your agent call search_docs(query="...") per question. Instead of pasting 50 files into the context window every turn, the agent retrieves the 3-5 chunks that actually matter. For a 100k-token codebase this typically cuts per-turn context to 2-5%.

How do I search git commit history from an AI agent?

Run `gnosis-mcp ingest-git /path/to/repo` to index commit messages, authors, and diff summaries as first-class documents. Agents can then search "why did we remove X" or "commits that touch payment.py" and get real answers from git history — not guesses.

Do I need a vector database or an embedding API key to use gnosis-mcp?

No. gnosis-mcp runs local ONNX embedding models (quantized, CPU-friendly) and stores them in SQLite alongside the FTS5 full-text index. Zero cloud dependencies. Optional: swap to PostgreSQL + pgvector for multi-user or large-scale deployments.

Can I use gnosis-mcp with Claude Code, Cursor, or Windsurf?

Yes — all three. gnosis-mcp is a standard MCP server. For Claude Code: add to ~/.claude/settings.json under mcpServers. For Cursor: add via Settings → MCP. For Windsurf: add in mcp_config.json. See https://gnosismcp.com/doc/docs/overview for copy-paste configs per IDE.

What embedder does gnosis-mcp use and how good is search quality?

gnosis-mcp bundles MongoDB/mdbr-leaf-ir — ranked #1 on the MTEB/BEIR leaderboard for models ≤ 100 M params. Our keyword path lands within 1 % of the Lucene BM25 baseline on SciFact (0.671 vs 0.679 nDCG@10). 632 tests, every release.