Configuration Reference
Configuration Reference
Every knob gnosis-mcp exposes lives under an environment variable prefixed
GNOSIS_MCP_*. There is no TOML/YAML config file. Philosophy:
- Zero config is the default — running
gnosis-mcp servewith no env set boots SQLite at~/.local/share/gnosis-mcp/docs.dband serves stdio. - Every override is an env var so you can configure inside Docker,
systemd, Claude Desktop / Cursor configs, ordirenvwith the same primitive. - Secrets stay in env — never in the DB, never on disk.
Variables are grouped below by what they control.
Core
GNOSIS_MCP_BACKEND
auto | sqlite | postgres — default auto.
auto inspects GNOSIS_MCP_DATABASE_URL / DATABASE_URL: a postgresql://
URL selects Postgres, anything else (or unset) selects SQLite.
GNOSIS_MCP_DATABASE_URL
Database connection string. Falls back to DATABASE_URL if unset.
- SQLite:
sqlite:///absolute/path/to/docs.db(or leave unset for the default XDG-compliant path). - Postgres: standard libpq URL, e.g.
postgresql://user:pass@host:5432/db.
GNOSIS_MCP_WRITABLE
true | false — default false.
Gate for the three write tools (upsert_doc, delete_doc, update_metadata).
When false, write tools return a structured error and no data is mutated.
GNOSIS_MCP_LOG_LEVEL
DEBUG | INFO | WARNING | ERROR | CRITICAL — default INFO.
Transport
GNOSIS_MCP_TRANSPORT
stdio | streamable-http | sse — default stdio.
Stdio is the MCP-client default; streamable-http exposes a /mcp endpoint
you can deploy publicly or on a network; sse is a legacy MCP transport.
GNOSIS_MCP_HOST
Default 127.0.0.1. Bind address for the HTTP transports. Use 0.0.0.0
to accept connections from other hosts (and remember to put an auth layer
in front — see GNOSIS_MCP_API_KEY).
GNOSIS_MCP_PORT
Default 8000. Port for the HTTP transport.
Ingestion & chunking
GNOSIS_MCP_CHUNK_SIZE
Default 2000. Minimum 500. Unit: characters (not tokens, not
words).
Target character length for chunks. Chunks never split inside fenced code blocks or markdown tables; splits prefer H2, then H3/H4, then paragraph boundaries.
Rough conversion: 2000 chars ≈ 600 tokens ≈ 300-350 English words.
Why 2000. On a real 558-doc developer-docs corpus with 25 hand-written golden queries, we swept chunk sizes 1000 → 4000 chars in steps. The peak sits on an 1800-2000 char plateau (0.8702 nDCG@10); both smaller (fragments sections, dilutes BM25 term density) and larger (merges unrelated content, same dilution in the other direction) score worse. 2000 is chosen over 1800 as the high end of the plateau — same quality, fewer chunks, faster ingest. Full sweep in bench-experiments-2026-04-18.
Raise it to 3000-4000 for long-form prose (blog posts, ADRs) where sections are naturally bigger. Lower to 1000-1500 for API references or rows-of-tables content where each fact is short and standalone.
GNOSIS_MCP_MAX_DOC_BYTES
Default 50_000_000 (50 MB).
Maximum content size accepted by upsert_doc. Prevents accidentally
attempting to index a 2 GB SQL dump.
GNOSIS_MCP_CONTENT_PREVIEW_CHARS
Default 200. Minimum 50.
Length of the preview slice returned by search_docs. Set larger if you want
chunks returned nearly whole; smaller if you're paying per-token downstream.
Search
GNOSIS_MCP_SEARCH_LIMIT_MAX
Default 20. Minimum 1.
Hard ceiling for the limit param on search_docs. Clients can ask for
larger numbers but get clamped.
GNOSIS_MCP_MAX_QUERY_CHARS
Default 10_000.
Rejects pathological queries early. Legitimate semantic queries are rarely over a couple hundred characters.
GNOSIS_MCP_RRF_K
Default 60.
Constant in the Reciprocal-Rank-Fusion formula used by hybrid search:
score = Σ 1 / (k + rank_i). Higher k flattens the rank curve and lets
vector scores contribute more relative to BM25. Typical values are 30–120.
GNOSIS_MCP_SEARCH_FUNCTION
(Postgres only.) Name of a user-defined func(query, limit) → table. When
set, search_docs delegates to it instead of the built-in path. Useful for
plugging in experimental ranking without forking the server.
Embeddings
GNOSIS_MCP_EMBED_PROVIDER
local | openai | ollama | custom — unset by default (no auto-embedding).
local— ONNX Runtime CPU inference. Requires the[embeddings]extra.openai— OpenAI-compatible HTTP API.ollama— an Ollama-compatible HTTP API.custom— any OpenAI-schema HTTP endpoint (setGNOSIS_MCP_EMBED_URL).
GNOSIS_MCP_EMBED_MODEL
Model name. The general default is text-embedding-3-small (OpenAI, 1536-dim).
When GNOSIS_MCP_EMBED_PROVIDER=local, the default switches to
MongoDB/mdbr-leaf-ir (384-dim, 23 MB quantized, Apache 2.0, auto-downloaded
on first run with an HTTPS + SHA-256 checksum assertion).
GNOSIS_MCP_EMBED_DIM / GNOSIS_MCP_EMBEDDING_DIM
Output dimension. Both spellings accepted. When unset, the server asks the provider to report it once at start-up.
GNOSIS_MCP_EMBED_URL
Custom / remote provider endpoint (OpenAI-schema /embeddings POST).
GNOSIS_MCP_EMBED_API_KEY
Bearer token for the remote provider.
GNOSIS_MCP_EMBED_BATCH_SIZE
Default 50. Minimum 1. Balances provider rate limits against ingest
throughput.
Reranking
GNOSIS_MCP_RERANK_ENABLED
true | false — default false.
Opt-in cross-encoder reranker (22M-param ONNX) applied to the top candidates
from search_docs before returning. Requires the [reranking] extra.
Typical cost: ~20 ms per query for the default top-20 pool on laptop CPU.
Web crawl
GNOSIS_MCP_CRAWL_EXTRACT_TIMEOUT_S
Default 30. Seconds before we abandon the HTML-to-markdown extraction
for a given page. Prevents pathological pages from freezing the crawl loop.
Webhooks
GNOSIS_MCP_WEBHOOK_URL
Fires a fire-and-forget POST on every write tool. Body is a small JSON
envelope: {tool, path, ts}. Useful for invalidating downstream caches.
GNOSIS_MCP_WEBHOOK_TIMEOUT
Default 5. Seconds. Minimum 1.
GNOSIS_MCP_WEBHOOK_ALLOW_PRIVATE
true | false — default false.
By default the webhook target must resolve to a public IP. Requests to
private, loopback, link-local, multicast, or reserved addresses are refused
with a warning log. Set true for intentional loopback CI setups.
REST API
Enable with the --rest flag on gnosis-mcp serve or GNOSIS_MCP_REST=true.
Lives alongside MCP on the same HTTP port. See rest-api.md
for the endpoint reference.
GNOSIS_MCP_REST
true | false — default false.
GNOSIS_MCP_API_KEY
Optional. When set, every endpoint (except /health) requires
Authorization: Bearer <key>. Comparison is timing-safe.
GNOSIS_MCP_PUBLIC_PATHS
Comma-separated list of paths that bypass auth. /health is always public.
Useful when mounting a custom /status or /version endpoint.
GNOSIS_MCP_CORS_ORIGINS
Comma-separated origins, or *. No CORS response headers unless set.
Access log
GNOSIS_MCP_ACCESS_LOG
true | false — default true.
When enabled, records which documents are retrieved via search_docs
(top 3 results) and get_doc. Used by get_context to surface
frequently-read documentation. Writes to search_access_log table; set to
false to disable tracking entirely.
Postgres-specific
GNOSIS_MCP_SCHEMA
Default public. Alternate schema for all gnosis-mcp tables.
GNOSIS_MCP_CHUNKS_TABLE
Default documentation_chunks. Single name or comma-separated list —
with multiple tables, search queries use UNION ALL.
GNOSIS_MCP_LINKS_TABLE
Default documentation_links.
GNOSIS_MCP_POOL_MIN / GNOSIS_MCP_POOL_MAX
asyncpg connection-pool bounds. Defaults 1 / 3.
Column overrides (GNOSIS_MCP_COL_*)
When connecting to an existing schema with non-standard column names, map each field:
| Env var | Logical column |
|---|---|
GNOSIS_MCP_COL_FILE_PATH |
file_path |
GNOSIS_MCP_COL_CHUNK_INDEX |
chunk_index |
GNOSIS_MCP_COL_TITLE |
title |
GNOSIS_MCP_COL_CATEGORY |
category |
GNOSIS_MCP_COL_CONTENT |
content |
GNOSIS_MCP_COL_AUDIENCE |
audience |
GNOSIS_MCP_COL_TAGS |
tags |
GNOSIS_MCP_COL_EMBEDDING |
embedding |
GNOSIS_MCP_COL_TSV |
search_vector |
GNOSIS_MCP_COL_SOURCE_PATH |
source_path (links) |
GNOSIS_MCP_COL_TARGET_PATH |
target_path (links) |
GNOSIS_MCP_COL_RELATION_TYPE |
relation_type (links) |
Every identifier is validated against ^[a-zA-Z_][a-zA-Z0-9_]*$ at startup
to prevent SQL injection via config.
Precedence
- Explicit env var.
- Derived default (e.g.
GNOSIS_MCP_BACKEND=autoinferring from URL). - Hard-coded default.
There is no file-based override layer. Restart the server to pick up env changes.