Gnosis MCP — Full Reference
Gnosis MCP — Full Reference
Zero-config MCP server that makes your markdown docs searchable by AI agents. SQLite default, PostgreSQL optional. Works with Claude Code, Cursor, Windsurf, Cline. PyPI: gnosis-mcp | CLI: gnosis-mcp | Import: gnosis_mcp
Install
pip install gnosis-mcp # SQLite (default, zero config)
pip install gnosis-mcp[embeddings] # + Local ONNX semantic search (no API key)
pip install gnosis-mcp[postgres] # + PostgreSQL support
pip install gnosis-mcp[web] # + Web crawl (httpx + trafilatura)
Quick Setup (SQLite)
gnosis-mcp ingest ./docs/ # Auto-creates DB + loads markdown
gnosis-mcp search "query" # Test it works
gnosis-mcp serve # Start MCP server
Quick Setup (SQLite + Semantic Search)
pip install gnosis-mcp[embeddings]
gnosis-mcp ingest ./docs/ --embed # Ingest + embed (downloads 23MB model on first run)
gnosis-mcp serve # Hybrid keyword+semantic search auto-activated
Quick Setup (PostgreSQL)
export GNOSIS_MCP_DATABASE_URL="postgresql://user:pass@localhost:5432/mydb"
gnosis-mcp init-db # Create tables (idempotent)
gnosis-mcp ingest ./docs/ # Load markdown files
gnosis-mcp check # Verify connection + schema
gnosis-mcp serve
Editor Config
The same JSON structure works in every editor. Add it to the appropriate config file:
| Editor | Config File |
|---|---|
| Claude Code | .claude/mcp.json |
| Cursor | .cursor/mcp.json |
| VS Code (Copilot) | .vscode/mcp.json (note: uses "servers" not "mcpServers") |
| Windsurf | ~/.codeium/windsurf/mcp_config.json |
| JetBrains | Settings > Tools > AI Assistant > MCP Servers |
| Cline | Cline MCP settings panel |
SQLite (no env needed):
{
"mcpServers": {
"docs": {
"command": "gnosis-mcp",
"args": ["serve"]
}
}
}
PostgreSQL:
{
"mcpServers": {
"docs": {
"command": "gnosis-mcp",
"args": ["serve"],
"env": {
"GNOSIS_MCP_DATABASE_URL": "postgresql://user:pass@localhost:5432/mydb"
}
}
}
}
Backends
| SQLite (default) | SQLite + embeddings | PostgreSQL | |
|---|---|---|---|
| Install | pip install gnosis-mcp |
pip install gnosis-mcp[embeddings] |
pip install gnosis-mcp[postgres] |
| Config | Nothing | Nothing | Set DATABASE_URL |
| Search | FTS5 keyword (BM25) | Hybrid keyword+semantic (RRF) | tsvector + pgvector hybrid |
| Embeddings | None | Local ONNX (23MB, no API) | Any provider + HNSW index |
| Multi-table | No | No | Yes (UNION ALL) |
Auto-detection: DATABASE_URL set to postgresql://... -> PostgreSQL. Not set -> SQLite. Override: GNOSIS_MCP_BACKEND=sqlite|postgres.
The [embeddings] extra installs: onnxruntime, tokenizers, numpy, sqlite-vec. Default model: MongoDB/mdbr-leaf-ir (23M params, 23MB quantized). Model auto-downloads from HuggingFace via stdlib urllib on first use. Customize with GNOSIS_MCP_EMBED_MODEL.
Tools (6)
Read Tools (always available)
search_docs(query, category?, limit?, query_embedding?) — Search docs using keyword (FTS5/tsvector) or hybrid semantic+keyword search. Returns
highlightfield with matched terms in<mark>tags.- query: string (required) — search text
- category: string (optional) — filter by category
- limit: int (default 5, max configurable) — result count
- query_embedding: list[float] (optional) — pre-computed embedding for hybrid search (PostgreSQL)
get_doc(path, max_length?) — Get full document by file path. Reassembles chunks in order.
- path: string (required) — e.g. "guides/quickstart.md"
- max_length: int (optional) — truncate at N characters
get_related(path) — Find related documents via bidirectional link graph.
- path: string (required)
Write Tools (require GNOSIS_MCP_WRITABLE=true)
upsert_doc(path, content, title?, category?, audience?, tags?, embeddings?) — Insert or replace document. Auto-chunks at paragraph boundaries. Optional
embeddingsaccepts pre-computed vectors (one per chunk).delete_doc(path) — Delete document, its chunks, and links.
update_metadata(path, title?, category?, audience?, tags?) — Update metadata fields on all chunks.
Resources (3)
- gnosis://docs — List all documents with title, category, chunk count
- gnosis://docs/{path} — Read document content by path
- gnosis://categories — List categories with document counts
REST API (v0.10.0+)
Enable native HTTP endpoints alongside MCP on the same port. Uses Starlette (bundled with mcp>=1.20, no new dependencies).
Enable: gnosis-mcp serve --transport streamable-http --rest
Or set: GNOSIS_MCP_REST=true
| Endpoint | Description |
|---|---|
GET /health |
{"status": "ok", "version", "backend", "docs"} |
GET /api/search?q=&limit=&category= |
{"results": [...], "query", "count"} — auto-embeds with local provider |
GET /api/docs/{path} |
{"title", "content", "category", "audience", "tags", "chunks"} |
GET /api/docs/{path}/related |
{"results": [{"related_path", "relation_type", "direction"}]} |
GET /api/categories |
[{"category", "docs"}] |
| Env Variable | Description |
|---|---|
GNOSIS_MCP_REST |
true/1/yes to enable REST API |
GNOSIS_MCP_CORS_ORIGINS |
* or comma-separated origins (e.g. http://localhost:5174) |
GNOSIS_MCP_API_KEY |
Bearer token required in Authorization: Bearer <key> |
Configuration (Environment Variables)
All settings via GNOSIS_MCP_* environment variables. Nothing required for SQLite.
Core Settings
- GNOSIS_MCP_DATABASE_URL — PostgreSQL URL or SQLite file path (default: SQLite at ~/.local/share/gnosis-mcp/docs.db)
- GNOSIS_MCP_BACKEND — Force backend: auto, sqlite, postgres (default: auto)
- GNOSIS_MCP_SCHEMA — Database schema, PostgreSQL only (default: public)
- GNOSIS_MCP_CHUNKS_TABLE — Chunks table name, comma-separated for multi-table on PG (default: documentation_chunks)
- GNOSIS_MCP_LINKS_TABLE — Links table name (default: documentation_links)
- GNOSIS_MCP_SEARCH_FUNCTION — Custom search function, PostgreSQL only (default: none)
- GNOSIS_MCP_EMBEDDING_DIM — Embedding vector dimension for init-db (default: 1536)
- GNOSIS_MCP_POOL_MIN — Min pool connections, PostgreSQL only (default: 1)
- GNOSIS_MCP_POOL_MAX — Max pool connections, PostgreSQL only (default: 3)
- GNOSIS_MCP_WRITABLE — Enable write tools: true/1/yes (default: false)
- GNOSIS_MCP_WEBHOOK_URL — URL to POST on doc changes (default: none)
Embedding
- GNOSIS_MCP_EMBED_PROVIDER — Embedding provider: openai, ollama, custom, or local (default: none, auto-detects local if [embeddings] installed)
- GNOSIS_MCP_EMBED_MODEL — Embedding model name (default: text-embedding-3-small for remote, MongoDB/mdbr-leaf-ir for local)
- GNOSIS_MCP_EMBED_DIM — Embedding dimension for local Matryoshka truncation and vec0 table width (default: 384)
- GNOSIS_MCP_EMBED_API_KEY — API key for embedding provider (default: none)
- GNOSIS_MCP_EMBED_URL — Custom embedding endpoint URL (default: none)
- GNOSIS_MCP_EMBED_BATCH_SIZE — Chunks per embedding batch, min 1 (default: 50)
Tuning
- GNOSIS_MCP_CONTENT_PREVIEW_CHARS — Characters in search previews, min 50 (default: 200)
- GNOSIS_MCP_CHUNK_SIZE — Max chars per chunk, min 500 (default: 4000)
- GNOSIS_MCP_SEARCH_LIMIT_MAX — Max search result limit, min 1 (default: 20)
- GNOSIS_MCP_WEBHOOK_TIMEOUT — Webhook timeout seconds, min 1 (default: 5)
- GNOSIS_MCP_TRANSPORT — Server transport: stdio, sse, or streamable-http (default: stdio)
- GNOSIS_MCP_HOST — Bind address for HTTP transports (default: 127.0.0.1)
- GNOSIS_MCP_PORT — Port for HTTP transports (default: 8000)
- GNOSIS_MCP_LOG_LEVEL — Logging: DEBUG/INFO/WARNING/ERROR/CRITICAL (default: INFO)
Column Overrides (for existing tables with non-standard names)
- GNOSIS_MCP_COL_FILE_PATH (default: file_path)
- GNOSIS_MCP_COL_TITLE (default: title)
- GNOSIS_MCP_COL_CONTENT (default: content)
- GNOSIS_MCP_COL_CHUNK_INDEX (default: chunk_index)
- GNOSIS_MCP_COL_CATEGORY (default: category)
- GNOSIS_MCP_COL_AUDIENCE (default: audience)
- GNOSIS_MCP_COL_TAGS (default: tags)
- GNOSIS_MCP_COL_EMBEDDING (default: embedding)
- GNOSIS_MCP_COL_TSV (default: tsv)
- GNOSIS_MCP_COL_SOURCE_PATH (default: source_path)
- GNOSIS_MCP_COL_TARGET_PATH (default: target_path)
- GNOSIS_MCP_COL_RELATION_TYPE (default: relation_type)
Custom Search Function (PostgreSQL)
Your function must accept:
(p_query_text text, p_categories text[], p_limit integer)
And return columns: file_path, title, content, category, combined_score.
Optionally, your function can also accept p_embedding vector(N) for hybrid search. Gnosis will try passing it automatically when query_embedding is provided.
CLI
gnosis-mcp ingest <path> [--dry-run] [--force] [--embed] # Load files (.md/.txt/.ipynb/.toml/.csv/.json)
gnosis-mcp crawl <url> [--sitemap] [--depth N] [--include] [--exclude] [--dry-run] [--force] [--embed]
gnosis-mcp serve [--transport stdio|sse|streamable-http] [--host H] [--port P] [--ingest PATH] [--watch PATH]
gnosis-mcp search <query> [-n LIMIT] [-c CAT] [--embed] # Search (--embed for hybrid semantic+keyword)
gnosis-mcp stats # Show document/chunk/embedding counts
gnosis-mcp check # Verify connection + sqlite-vec status
gnosis-mcp embed [--provider P] [--model M] [--dry-run] # Backfill embeddings (auto-detects local provider)
gnosis-mcp init-db [--dry-run] # Create tables (or preview SQL)
gnosis-mcp export [-f json|markdown|csv] [-c CAT] # Export documents
gnosis-mcp ingest-git <repo> [--since S] [--max-commits N] [--include P] [--exclude P] [--dry-run] [--embed] [--merges]
gnosis-mcp diff <path> # Show what would change on re-ingest
gnosis-mcp --version # Show version
Git History Ingestion
gnosis-mcp ingest-git <repo-path> converts git commit history into searchable markdown documents. Zero new dependencies — uses git log via subprocess.
gnosis-mcp ingest-git . # Current repo, all files
gnosis-mcp ingest-git /path/to/repo --since 6m # Last 6 months only
gnosis-mcp ingest-git . --include "src/*" --max-commits 5 # Filtered + limited
gnosis-mcp ingest-git . --dry-run # Preview without ingesting
gnosis-mcp ingest-git . --embed # Embed for semantic search
- One markdown document per file with meaningful commit history
- Each commit becomes an H2 section with date, author, subject, body
- Stored as
git-history/<file-path>to avoid collision with source docs - Category set to
git-historyfor scoped searches (search_docs(query, category="git-history")) - Auto-links to source file paths via
relates_tograph - Content hashing for incremental re-ingest (skips files with unchanged history)
--mergesflag includes merge commits (skipped by default)
Web Crawl
gnosis-mcp crawl <url> fetches and ingests documentation from any website. Requires the [web] extra (pip install gnosis-mcp[web]).
gnosis-mcp crawl https://docs.stripe.com/ --sitemap # Crawl via sitemap
gnosis-mcp crawl https://fastapi.tiangolo.com/ --depth 2 # BFS link crawl with depth limit
gnosis-mcp crawl https://docs.python.org/ --dry-run # Preview discovered URLs
gnosis-mcp crawl https://docs.example.com/ --sitemap --embed # Crawl + embed for semantic search
- Sitemap.xml discovery (
--sitemap) and BFS link crawling (--depth N) - robots.txt compliance — respects
Disallowrules - ETag/Last-Modified HTTP caching for incremental re-crawl (304 Not Modified)
- Content hashing: skips unchanged pages on re-crawl
- URL path filtering with
--includeand--excludeglob patterns - Rate-limited concurrent fetching (5 concurrent, 0.2s delay)
- SSRF protection: blocks private/internal IPs and checks redirect targets
- Crawled pages stored with URL as
file_path, hostname ascategory - Force re-crawl with
--force, dry run with--dry-run, embed with--embed
Ingest
gnosis-mcp ingest <path> scans a file or directory for supported files (.md, .txt, .ipynb, .toml, .csv, .json) and loads them into the database. Non-markdown formats are auto-converted using Python stdlib only — zero extra dependencies.
- Chunks by H2 headers (H3/H4 for oversized sections). Never splits inside fenced code blocks or tables
- Parses YAML-like frontmatter for title, category, audience, tags
- Auto-linking:
relates_toin frontmatter populates the links table (supports comma-separated and YAML list, skips glob patterns) - Content hashing: skips unchanged files on re-run
- Watch mode:
gnosis-mcp serve --watch ./docs/auto-re-ingests on file changes (mtime polling + debounce + auto-embed) - Category inferred from parent directory name
- Title extracted from first H1 heading
- Skips tiny files (<50 chars)
- Use
--dry-runto preview without writing
Architecture
src/gnosis_mcp/
├── backend.py # DocBackend Protocol + create_backend() factory
├── pg_backend.py # PostgreSQL backend — asyncpg, tsvector, pgvector, UNION ALL
├── sqlite_backend.py # SQLite backend — aiosqlite, FTS5 MATCH + bm25()
├── sqlite_schema.py # SQLite DDL — tables, FTS5, triggers, indexes
├── config.py # GnosisMcpConfig frozen dataclass, backend auto-detection
├── db.py # Backend lifecycle + FastMCP lifespan
├── server.py # FastMCP server: 6 tools + 3 resources + webhook helper
├── ingest.py # File scanner + converters: multi-format, smart chunking (H2/H3/H4), hashing
├── crawl.py # Web crawler — sitemap/BFS discovery, robots.txt, ETag caching, trafilatura
├── parsers/ # Non-file ingest sources
│ └── git_history.py # Git log → markdown documents per file (commit parsing, grouping, rendering)
├── watch.py # File watcher: mtime polling, auto-re-ingest on changes
├── schema.py # PostgreSQL DDL — tables, indexes, HNSW, hybrid search functions
├── embed.py # Embedding sidecar: provider abstraction (openai/ollama/custom/local)
├── local_embed.py # Local ONNX embedding engine — stdlib urllib model download
└── cli.py # argparse CLI: serve, init-db, ingest, ingest-git, crawl, search, embed, stats, export, diff, check
Default install deps: mcp + aiosqlite. Optional: asyncpg (via [postgres] extra), onnxruntime + tokenizers + numpy + sqlite-vec (via [embeddings] extra), httpx + trafilatura (via [web] extra). Model download uses stdlib urllib (no huggingface-hub dependency).
Performance
9,463 QPS on 100 docs (300 chunks), 471 QPS on 10,000 docs (30,000 chunks) — SQLite FTS5 keyword. p95 under 6 ms at 10K corpus. End-to-end through the MCP stdio protocol: 8.7 ms mean, 13.0 ms p95 (v0.11.0, after the mcp SDK 1.27 transport upgrade). 632 tests, 10 RAG eval cases (Hit@5 = 1.00, MRR = 0.95, Precision@5 = 0.67). Install size: ~23MB with [embeddings] (ONNX model), ~5MB base. Benchmarks: gnosis-mcp eval, python tests/bench/bench_search.py, python tests/bench/bench_rag.py, python tests/bench/bench_mcp_e2e.py. See docs/benchmarks.md for methodology.