gnosis-mcp
docs · graph · meta
meta

Gnosis MCP — Full Reference

llms-full.txt

Gnosis MCP — Full Reference

Zero-config MCP server that makes your markdown docs searchable by AI agents. SQLite default, PostgreSQL optional. Works with Claude Code, Cursor, Windsurf, Cline. PyPI: gnosis-mcp | CLI: gnosis-mcp | Import: gnosis_mcp

Install

pip install gnosis-mcp               # SQLite (default, zero config)
pip install gnosis-mcp[embeddings]  # + Local ONNX semantic search (no API key)
pip install gnosis-mcp[postgres]     # + PostgreSQL support
pip install gnosis-mcp[web]          # + Web crawl (httpx + trafilatura)

Quick Setup (SQLite)

gnosis-mcp ingest ./docs/   # Auto-creates DB + loads markdown
gnosis-mcp search "query"   # Test it works
gnosis-mcp serve             # Start MCP server

Quick Setup (SQLite + Semantic Search)

pip install gnosis-mcp[embeddings]
gnosis-mcp ingest ./docs/ --embed   # Ingest + embed (downloads 23MB model on first run)
gnosis-mcp serve                    # Hybrid keyword+semantic search auto-activated

Quick Setup (PostgreSQL)

export GNOSIS_MCP_DATABASE_URL="postgresql://user:pass@localhost:5432/mydb"
gnosis-mcp init-db          # Create tables (idempotent)
gnosis-mcp ingest ./docs/   # Load markdown files
gnosis-mcp check            # Verify connection + schema
gnosis-mcp serve

Editor Config

The same JSON structure works in every editor. Add it to the appropriate config file:

Editor Config File
Claude Code .claude/mcp.json
Cursor .cursor/mcp.json
VS Code (Copilot) .vscode/mcp.json (note: uses "servers" not "mcpServers")
Windsurf ~/.codeium/windsurf/mcp_config.json
JetBrains Settings > Tools > AI Assistant > MCP Servers
Cline Cline MCP settings panel

SQLite (no env needed):

{
  "mcpServers": {
    "docs": {
      "command": "gnosis-mcp",
      "args": ["serve"]
    }
  }
}

PostgreSQL:

{
  "mcpServers": {
    "docs": {
      "command": "gnosis-mcp",
      "args": ["serve"],
      "env": {
        "GNOSIS_MCP_DATABASE_URL": "postgresql://user:pass@localhost:5432/mydb"
      }
    }
  }
}

Backends

SQLite (default) SQLite + embeddings PostgreSQL
Install pip install gnosis-mcp pip install gnosis-mcp[embeddings] pip install gnosis-mcp[postgres]
Config Nothing Nothing Set DATABASE_URL
Search FTS5 keyword (BM25) Hybrid keyword+semantic (RRF) tsvector + pgvector hybrid
Embeddings None Local ONNX (23MB, no API) Any provider + HNSW index
Multi-table No No Yes (UNION ALL)

Auto-detection: DATABASE_URL set to postgresql://... -> PostgreSQL. Not set -> SQLite. Override: GNOSIS_MCP_BACKEND=sqlite|postgres.

The [embeddings] extra installs: onnxruntime, tokenizers, numpy, sqlite-vec. Default model: MongoDB/mdbr-leaf-ir (23M params, 23MB quantized). Model auto-downloads from HuggingFace via stdlib urllib on first use. Customize with GNOSIS_MCP_EMBED_MODEL.

Tools (6)

Read Tools (always available)

  1. search_docs(query, category?, limit?, query_embedding?) — Search docs using keyword (FTS5/tsvector) or hybrid semantic+keyword search. Returns highlight field with matched terms in <mark> tags.

    • query: string (required) — search text
    • category: string (optional) — filter by category
    • limit: int (default 5, max configurable) — result count
    • query_embedding: list[float] (optional) — pre-computed embedding for hybrid search (PostgreSQL)
  2. get_doc(path, max_length?) — Get full document by file path. Reassembles chunks in order.

    • path: string (required) — e.g. "guides/quickstart.md"
    • max_length: int (optional) — truncate at N characters
  3. get_related(path) — Find related documents via bidirectional link graph.

    • path: string (required)

Write Tools (require GNOSIS_MCP_WRITABLE=true)

  1. upsert_doc(path, content, title?, category?, audience?, tags?, embeddings?) — Insert or replace document. Auto-chunks at paragraph boundaries. Optional embeddings accepts pre-computed vectors (one per chunk).

  2. delete_doc(path) — Delete document, its chunks, and links.

  3. update_metadata(path, title?, category?, audience?, tags?) — Update metadata fields on all chunks.

Resources (3)

  • gnosis://docs — List all documents with title, category, chunk count
  • gnosis://docs/{path} — Read document content by path
  • gnosis://categories — List categories with document counts

REST API (v0.10.0+)

Enable native HTTP endpoints alongside MCP on the same port. Uses Starlette (bundled with mcp>=1.20, no new dependencies).

Enable: gnosis-mcp serve --transport streamable-http --rest Or set: GNOSIS_MCP_REST=true

Endpoint Description
GET /health {"status": "ok", "version", "backend", "docs"}
GET /api/search?q=&limit=&category= {"results": [...], "query", "count"} — auto-embeds with local provider
GET /api/docs/{path} {"title", "content", "category", "audience", "tags", "chunks"}
GET /api/docs/{path}/related {"results": [{"related_path", "relation_type", "direction"}]}
GET /api/categories [{"category", "docs"}]
Env Variable Description
GNOSIS_MCP_REST true/1/yes to enable REST API
GNOSIS_MCP_CORS_ORIGINS * or comma-separated origins (e.g. http://localhost:5174)
GNOSIS_MCP_API_KEY Bearer token required in Authorization: Bearer <key>

Configuration (Environment Variables)

All settings via GNOSIS_MCP_* environment variables. Nothing required for SQLite.

Core Settings

  • GNOSIS_MCP_DATABASE_URL — PostgreSQL URL or SQLite file path (default: SQLite at ~/.local/share/gnosis-mcp/docs.db)
  • GNOSIS_MCP_BACKEND — Force backend: auto, sqlite, postgres (default: auto)
  • GNOSIS_MCP_SCHEMA — Database schema, PostgreSQL only (default: public)
  • GNOSIS_MCP_CHUNKS_TABLE — Chunks table name, comma-separated for multi-table on PG (default: documentation_chunks)
  • GNOSIS_MCP_LINKS_TABLE — Links table name (default: documentation_links)
  • GNOSIS_MCP_SEARCH_FUNCTION — Custom search function, PostgreSQL only (default: none)
  • GNOSIS_MCP_EMBEDDING_DIM — Embedding vector dimension for init-db (default: 1536)
  • GNOSIS_MCP_POOL_MIN — Min pool connections, PostgreSQL only (default: 1)
  • GNOSIS_MCP_POOL_MAX — Max pool connections, PostgreSQL only (default: 3)
  • GNOSIS_MCP_WRITABLE — Enable write tools: true/1/yes (default: false)
  • GNOSIS_MCP_WEBHOOK_URL — URL to POST on doc changes (default: none)

Embedding

  • GNOSIS_MCP_EMBED_PROVIDER — Embedding provider: openai, ollama, custom, or local (default: none, auto-detects local if [embeddings] installed)
  • GNOSIS_MCP_EMBED_MODEL — Embedding model name (default: text-embedding-3-small for remote, MongoDB/mdbr-leaf-ir for local)
  • GNOSIS_MCP_EMBED_DIM — Embedding dimension for local Matryoshka truncation and vec0 table width (default: 384)
  • GNOSIS_MCP_EMBED_API_KEY — API key for embedding provider (default: none)
  • GNOSIS_MCP_EMBED_URL — Custom embedding endpoint URL (default: none)
  • GNOSIS_MCP_EMBED_BATCH_SIZE — Chunks per embedding batch, min 1 (default: 50)

Tuning

  • GNOSIS_MCP_CONTENT_PREVIEW_CHARS — Characters in search previews, min 50 (default: 200)
  • GNOSIS_MCP_CHUNK_SIZE — Max chars per chunk, min 500 (default: 4000)
  • GNOSIS_MCP_SEARCH_LIMIT_MAX — Max search result limit, min 1 (default: 20)
  • GNOSIS_MCP_WEBHOOK_TIMEOUT — Webhook timeout seconds, min 1 (default: 5)
  • GNOSIS_MCP_TRANSPORT — Server transport: stdio, sse, or streamable-http (default: stdio)
  • GNOSIS_MCP_HOST — Bind address for HTTP transports (default: 127.0.0.1)
  • GNOSIS_MCP_PORT — Port for HTTP transports (default: 8000)
  • GNOSIS_MCP_LOG_LEVEL — Logging: DEBUG/INFO/WARNING/ERROR/CRITICAL (default: INFO)

Column Overrides (for existing tables with non-standard names)

  • GNOSIS_MCP_COL_FILE_PATH (default: file_path)
  • GNOSIS_MCP_COL_TITLE (default: title)
  • GNOSIS_MCP_COL_CONTENT (default: content)
  • GNOSIS_MCP_COL_CHUNK_INDEX (default: chunk_index)
  • GNOSIS_MCP_COL_CATEGORY (default: category)
  • GNOSIS_MCP_COL_AUDIENCE (default: audience)
  • GNOSIS_MCP_COL_TAGS (default: tags)
  • GNOSIS_MCP_COL_EMBEDDING (default: embedding)
  • GNOSIS_MCP_COL_TSV (default: tsv)
  • GNOSIS_MCP_COL_SOURCE_PATH (default: source_path)
  • GNOSIS_MCP_COL_TARGET_PATH (default: target_path)
  • GNOSIS_MCP_COL_RELATION_TYPE (default: relation_type)

Custom Search Function (PostgreSQL)

Your function must accept:

(p_query_text text, p_categories text[], p_limit integer)

And return columns: file_path, title, content, category, combined_score.

Optionally, your function can also accept p_embedding vector(N) for hybrid search. Gnosis will try passing it automatically when query_embedding is provided.

CLI

gnosis-mcp ingest <path> [--dry-run] [--force] [--embed]   # Load files (.md/.txt/.ipynb/.toml/.csv/.json)
gnosis-mcp crawl <url> [--sitemap] [--depth N] [--include] [--exclude] [--dry-run] [--force] [--embed]
gnosis-mcp serve [--transport stdio|sse|streamable-http] [--host H] [--port P] [--ingest PATH] [--watch PATH]
gnosis-mcp search <query> [-n LIMIT] [-c CAT] [--embed]    # Search (--embed for hybrid semantic+keyword)
gnosis-mcp stats                                           # Show document/chunk/embedding counts
gnosis-mcp check                                           # Verify connection + sqlite-vec status
gnosis-mcp embed [--provider P] [--model M] [--dry-run]    # Backfill embeddings (auto-detects local provider)
gnosis-mcp init-db [--dry-run]                             # Create tables (or preview SQL)
gnosis-mcp export [-f json|markdown|csv] [-c CAT]          # Export documents
gnosis-mcp ingest-git <repo> [--since S] [--max-commits N] [--include P] [--exclude P] [--dry-run] [--embed] [--merges]
gnosis-mcp diff <path>                                     # Show what would change on re-ingest
gnosis-mcp --version                                       # Show version

Git History Ingestion

gnosis-mcp ingest-git <repo-path> converts git commit history into searchable markdown documents. Zero new dependencies — uses git log via subprocess.

gnosis-mcp ingest-git .                                      # Current repo, all files
gnosis-mcp ingest-git /path/to/repo --since 6m               # Last 6 months only
gnosis-mcp ingest-git . --include "src/*" --max-commits 5    # Filtered + limited
gnosis-mcp ingest-git . --dry-run                            # Preview without ingesting
gnosis-mcp ingest-git . --embed                              # Embed for semantic search
  • One markdown document per file with meaningful commit history
  • Each commit becomes an H2 section with date, author, subject, body
  • Stored as git-history/<file-path> to avoid collision with source docs
  • Category set to git-history for scoped searches (search_docs(query, category="git-history"))
  • Auto-links to source file paths via relates_to graph
  • Content hashing for incremental re-ingest (skips files with unchanged history)
  • --merges flag includes merge commits (skipped by default)

Web Crawl

gnosis-mcp crawl <url> fetches and ingests documentation from any website. Requires the [web] extra (pip install gnosis-mcp[web]).

gnosis-mcp crawl https://docs.stripe.com/ --sitemap           # Crawl via sitemap
gnosis-mcp crawl https://fastapi.tiangolo.com/ --depth 2      # BFS link crawl with depth limit
gnosis-mcp crawl https://docs.python.org/ --dry-run            # Preview discovered URLs
gnosis-mcp crawl https://docs.example.com/ --sitemap --embed   # Crawl + embed for semantic search
  • Sitemap.xml discovery (--sitemap) and BFS link crawling (--depth N)
  • robots.txt compliance — respects Disallow rules
  • ETag/Last-Modified HTTP caching for incremental re-crawl (304 Not Modified)
  • Content hashing: skips unchanged pages on re-crawl
  • URL path filtering with --include and --exclude glob patterns
  • Rate-limited concurrent fetching (5 concurrent, 0.2s delay)
  • SSRF protection: blocks private/internal IPs and checks redirect targets
  • Crawled pages stored with URL as file_path, hostname as category
  • Force re-crawl with --force, dry run with --dry-run, embed with --embed

Ingest

gnosis-mcp ingest <path> scans a file or directory for supported files (.md, .txt, .ipynb, .toml, .csv, .json) and loads them into the database. Non-markdown formats are auto-converted using Python stdlib only — zero extra dependencies.

  • Chunks by H2 headers (H3/H4 for oversized sections). Never splits inside fenced code blocks or tables
  • Parses YAML-like frontmatter for title, category, audience, tags
  • Auto-linking: relates_to in frontmatter populates the links table (supports comma-separated and YAML list, skips glob patterns)
  • Content hashing: skips unchanged files on re-run
  • Watch mode: gnosis-mcp serve --watch ./docs/ auto-re-ingests on file changes (mtime polling + debounce + auto-embed)
  • Category inferred from parent directory name
  • Title extracted from first H1 heading
  • Skips tiny files (<50 chars)
  • Use --dry-run to preview without writing

Architecture

src/gnosis_mcp/
├── backend.py         # DocBackend Protocol + create_backend() factory
├── pg_backend.py      # PostgreSQL backend — asyncpg, tsvector, pgvector, UNION ALL
├── sqlite_backend.py  # SQLite backend — aiosqlite, FTS5 MATCH + bm25()
├── sqlite_schema.py   # SQLite DDL — tables, FTS5, triggers, indexes
├── config.py          # GnosisMcpConfig frozen dataclass, backend auto-detection
├── db.py              # Backend lifecycle + FastMCP lifespan
├── server.py          # FastMCP server: 6 tools + 3 resources + webhook helper
├── ingest.py          # File scanner + converters: multi-format, smart chunking (H2/H3/H4), hashing
├── crawl.py           # Web crawler — sitemap/BFS discovery, robots.txt, ETag caching, trafilatura
├── parsers/           # Non-file ingest sources
│   └── git_history.py # Git log → markdown documents per file (commit parsing, grouping, rendering)
├── watch.py           # File watcher: mtime polling, auto-re-ingest on changes
├── schema.py          # PostgreSQL DDL — tables, indexes, HNSW, hybrid search functions
├── embed.py           # Embedding sidecar: provider abstraction (openai/ollama/custom/local)
├── local_embed.py     # Local ONNX embedding engine — stdlib urllib model download
└── cli.py             # argparse CLI: serve, init-db, ingest, ingest-git, crawl, search, embed, stats, export, diff, check

Default install deps: mcp + aiosqlite. Optional: asyncpg (via [postgres] extra), onnxruntime + tokenizers + numpy + sqlite-vec (via [embeddings] extra), httpx + trafilatura (via [web] extra). Model download uses stdlib urllib (no huggingface-hub dependency).

Performance

9,463 QPS on 100 docs (300 chunks), 471 QPS on 10,000 docs (30,000 chunks) — SQLite FTS5 keyword. p95 under 6 ms at 10K corpus. End-to-end through the MCP stdio protocol: 8.7 ms mean, 13.0 ms p95 (v0.11.0, after the mcp SDK 1.27 transport upgrade). 632 tests, 10 RAG eval cases (Hit@5 = 1.00, MRR = 0.95, Precision@5 = 0.67). Install size: ~23MB with [embeddings] (ONNX model), ~5MB base. Benchmarks: gnosis-mcp eval, python tests/bench/bench_search.py, python tests/bench/bench_rag.py, python tests/bench/bench_mcp_e2e.py. See docs/benchmarks.md for methodology.

License

MIT — https://github.com/nicholasglazer/gnosis-mcp