Gnosis MCP

llms.txt

Gnosis MCP

Zero-config MCP server that makes your markdown docs searchable by AI agents. SQLite default, PostgreSQL optional. Works with Claude Code, Cursor, Windsurf, Cline.

What it does

Gnosis MCP loads documentation files into a database, chunks them by headings (H2/H3/H4, never splitting code blocks or tables), and exposes them as MCP tools and resources. AI agents call search_docs to find relevant documentation instead of guessing or reading entire files. Supports .md, .txt, .ipynb, .toml, .csv, .json (stdlib only) + optional .rst ([rst] extra) and .pdf ([pdf] extra). Use --watch to auto-re-ingest when files change. Crawl documentation from any website with gnosis-mcp crawl <url> (requires [web] extra). Ingest git commit history as searchable context with gnosis-mcp ingest-git <repo> — zero new dependencies.

Quick Start

pip install gnosis-mcp
gnosis-mcp ingest ./docs/       # auto-creates SQLite database + loads docs
gnosis-mcp serve                # start MCP server (stdio default, or --transport streamable-http)

# Web crawl (optional)
pip install gnosis-mcp[web]
gnosis-mcp crawl https://docs.example.com/ --sitemap   # crawl docs from the web

# Git history (no extra deps)
gnosis-mcp ingest-git .                                 # ingest commit history as searchable docs
gnosis-mcp ingest-git . --since 6m --include "src/*"    # filtered + time-limited

Add to your editor's MCP config (e.g. .claude/mcp.json for Claude Code):

{
  "mcpServers": {
    "docs": {
      "command": "gnosis-mcp",
      "args": ["serve"]
    }
  }
}

Tools

Read Tools (always available)

Re-indexing a re-organized corpus

When your knowledge folder changes significantly (files moved or deleted), use one of:

gnosis-mcp ingest ./docs --prune — ingest new/changed, remove DB chunks whose source file no longer exists
gnosis-mcp ingest ./docs --wipe — full reset before re-ingest (nuclear)
gnosis-mcp prune ./docs --dry-run — preview what would be deleted without touching anything
gnosis-mcp prune ./docs — delete stale only, no ingest

Pruning is scoped to the given root; crawled URLs are preserved unless --include-crawled is passed.

search_docs(query: str, category?: str, limit?: int, query_embedding?: list[float], rerank?: bool) -> JSON

Search documentation using keyword (FTS5/tsvector) or hybrid semantic+keyword search. When query_embedding is provided, hybrid mode merges BM25 and cosine scores via Reciprocal Rank Fusion (tune via GNOSIS_MCP_RRF_K, default 60). When rerank=true (or GNOSIS_MCP_RERANK_ENABLED=true), a cross-encoder re-scores the top-N candidates (requires the [reranking] extra). Returns: [{"file_path", "title", "content_preview", "score", "highlight", "rerank_score"?}] The highlight field contains matched terms in <mark> tags (FTS5 snippet on SQLite, ts_headline on PostgreSQL).

get_doc(path: str, max_length?: int) -> JSON

Retrieve full document by file path. Reassembles chunks in order. Returns: {"title", "content", "category", "audience", "tags"}

get_related(path: str) -> JSON

Find related documents via bidirectional link graph. Returns: [{"related_path", "relation_type", "direction"}]

Write Tools (requires GNOSIS_MCP_WRITABLE=true)

upsert_doc(path: str, content: str, title?: str, category?: str, audience?: str, tags?: list, embeddings?: list[list[float]]) -> JSON

Insert or replace a document. Auto-splits into chunks at paragraph boundaries. Optional embeddings parameter accepts pre-computed vectors (one per chunk). Returns: {"path", "chunks", "action": "upserted"}

delete_doc(path: str) -> JSON

Delete a document and all its chunks + related links. Returns: {"path", "chunks_deleted", "links_deleted", "action": "deleted"}

update_metadata(path: str, title?: str, category?: str, audience?: str, tags?: list) -> JSON

Update metadata on all chunks of a document. Only provided fields change. Returns: {"path", "chunks_updated", "action": "metadata_updated"}

Resources

gnosis://docs -- list all documents (path, title, category, chunk count)
gnosis://docs/{path} -- read document content by path
gnosis://categories -- list categories with doc counts

REST API (v0.10.0+)

Enable with --rest flag or GNOSIS_MCP_REST=true. Runs alongside MCP on the same HTTP port. Endpoints: GET /health, GET /api/search?q=&limit=&category=, GET /api/docs/{path}, GET /api/docs/{path}/related, GET /api/categories. Optional CORS (GNOSIS_MCP_CORS_ORIGINS) and API key auth (GNOSIS_MCP_API_KEY). No new dependencies — uses Starlette (already bundled with mcp>=1.20).

Backends

	SQLite (default)	SQLite + embeddings	PostgreSQL
Install	`pip install gnosis-mcp`	`pip install gnosis-mcp[embeddings]`	`pip install gnosis-mcp[postgres]`
Config	Nothing	Nothing	Set `DATABASE_URL`
Search	FTS5 keyword (BM25)	Hybrid keyword + semantic (RRF)	tsvector + pgvector hybrid

Auto-detection: GNOSIS_MCP_DATABASE_URL set to postgresql://... -> PostgreSQL. Not set -> SQLite at ~/.local/share/gnosis-mcp/docs.db. Override: GNOSIS_MCP_BACKEND=sqlite|postgres.

Ingestion extracts relates_to from frontmatter (comma-separated or YAML list) and populates the links table for get_related queries.

Local embeddings ([embeddings] extra): ONNX Runtime + tokenizers + sqlite-vec. Default model: MongoDB/mdbr-leaf-ir (23MB, Apache 2.0). No API key needed.

Configuration

Set GNOSIS_MCP_DATABASE_URL (or DATABASE_URL) for PostgreSQL. Leave unset for SQLite. Optional: GNOSIS_MCP_BACKEND, GNOSIS_MCP_SCHEMA, GNOSIS_MCP_CHUNKS_TABLE (comma-separated for multi-table on PG), GNOSIS_MCP_LINKS_TABLE, GNOSIS_MCP_SEARCH_FUNCTION, GNOSIS_MCP_EMBEDDING_DIM, GNOSIS_MCP_WRITABLE, GNOSIS_MCP_WEBHOOK_URL, GNOSIS_MCP_COL_* for column names. Embedding: GNOSIS_MCP_EMBED_PROVIDER (openai/ollama/custom/local), GNOSIS_MCP_EMBED_MODEL, GNOSIS_MCP_EMBED_DIM (384, for local Matryoshka truncation), GNOSIS_MCP_EMBED_API_KEY, GNOSIS_MCP_EMBED_URL, GNOSIS_MCP_EMBED_BATCH_SIZE. Tuning: GNOSIS_MCP_CONTENT_PREVIEW_CHARS, GNOSIS_MCP_CHUNK_SIZE, GNOSIS_MCP_SEARCH_LIMIT_MAX, GNOSIS_MCP_WEBHOOK_TIMEOUT, GNOSIS_MCP_TRANSPORT (stdio/sse/streamable-http), GNOSIS_MCP_HOST, GNOSIS_MCP_PORT, GNOSIS_MCP_LOG_LEVEL.

Database Schema

Chunks table: (file_path, chunk_index, title, content, category, audience, tags, embedding, content_hash) Links table: (source_path, target_path, relation_type)

Tables are auto-created on first ingest. Run gnosis-mcp init-db to create them manually, or --dry-run to preview SQL.

Editor Config Locations

Editor	Config Path
Claude Code	`.claude/mcp.json`
Cursor	`.cursor/mcp.json`
VS Code (Copilot)	`.vscode/mcp.json` (uses `"servers"` key, not `"mcpServers"`)
Windsurf	`~/.codeium/windsurf/mcp_config.json`
JetBrains	Settings > Tools > AI Assistant > MCP Servers
Cline	Cline MCP settings panel

Performance

9,463 QPS on 100 docs (300 chunks), 471 QPS on 10,000 docs (30,000 chunks). p95 under 0.2 ms at 100 docs, under 6 ms at 10,000. End-to-end through the MCP stdio protocol: 8.7 ms mean, 13.0 ms p95 (v0.11.0 SDK 1.27 transport). 632 tests, 10 RAG eval cases (Hit@5 = 1.00, MRR = 0.95, Precision@5 = 0.67). Install size: ~23MB with [embeddings] (ONNX model), ~5MB base. Reproduce with gnosis-mcp eval, python tests/bench/bench_search.py, python tests/bench/bench_rag.py, python tests/bench/bench_mcp_e2e.py.

Documentation

README: Quick start, editor setup, backend comparison, configuration, performance
llms-full.txt: Complete reference in one file
llms-install.md: Step-by-step install guide