QMD (Query Markup Documents) is a local search engine for Markdown files that combines BM25 keyword search, vector semantic search, and LLM re-ranking. It was created by Tobi Lütke and runs entirely on your machine with no API keys or cloud dependencies.

Does QMD require an API key?

No. QMD runs entirely locally using three small GGUF models totaling about 2GB. No API keys, no cloud services, and no data leaves your machine.

How much disk space does QMD need?

About 2GB for the three GGUF models (embedding, re-ranking, and query expansion). The SQLite search index grows with your content but is typically small.

Can I use QMD with Obsidian or other Markdown files?

Yes. QMD can index any directory of Markdown files alongside your OpenClaw agent's memory. Configure additional paths in your OpenClaw config to search across Obsidian vaults, project docs, meeting notes, and more in a single query.

How to Fix OpenClaw's Memory Search with QMD

If you’ve been running OpenClaw for a while, you’ve noticed: the more your agent remembers, the worse it gets at finding what it already knows.

Your agent writes daily logs, saves preferences to MEMORY.md, and accumulates weeks of context.

But ask it to recall a specific decision from three weeks ago and it either misses it or pulls up something tangentially related.

The problem isn’t that the memory is gone. It’s that the default search can’t find it.

I wrote about setting up OpenClaw for daily intelligence briefings a few weeks ago.

Since then, I’ve been digging into the memory side and landed on QMD.

TL;DR: OpenClaw’s default SQLite memory search struggles as your agent accumulates context. QMD replaces it with a local hybrid search engine that runs entirely on your machine. Install with bun install -g https://github.com/tobi/qmd, set memory.backend = "qmd" in your config, and restart OpenClaw.

What is QMD?

QMD (Query Markup Documents) is a local search engine for Markdown files created by Tobi Lütke (of Shopify fame).

It combines three search approaches:

BM25 full-text search — fast keyword matching. Great for exact terms, error messages, code symbols, and IDs.
Vector semantic search — finds conceptually similar content even when the wording differs. Uses local GGUF embedding models.
Hybrid search with LLM re-ranking — runs both in parallel, merges results using Reciprocal Rank Fusion, then re-ranks with a local language model.

Everything runs locally, no API keys, no cloud dependencies.

Three small GGUF models auto-download on first run:

Model	Purpose	Size
embedding-gemma-300M	Vector embeddings	~300MB
qwen3-reranker-0.6b	Result re-ranking	~640MB
qmd-query-expansion-1.7B	Query expansion	~1.1GB

Here’s why this matters.

Say your agent saved this note three weeks ago:

Decided to run the gateway on the Mac Mini in the closet. Port 18789, Cloudflare tunnel for external access.

Search for “gateway server setup” with the default SQLite backend and it misses this — the note doesn’t contain “server” or “setup.”

QMD finds it.

The vector search catches the conceptual match, BM25 hits on “gateway,” and query expansion fills in related phrasings like “infrastructure configuration” before the re-ranker sorts the results.

The trade-off is speed.

Hybrid searches take a few seconds instead of being instant.

For me, accurate recall is worth more than a couple seconds of latency.

Comparing OpenClaw’s memory options

OpenClaw’s memory system supports three search backends.

The default SQLite with vector search works out of the box.

It handles paraphrases well but misses exact tokens like IDs, error strings, and code symbols.

SQLite with hybrid search adds BM25 keyword matching alongside vectors, with optional MMR deduplication and temporal decay, no extra install needed.

QMD goes further with query expansion and LLM re-ranking.

	SQLite (Vector)	SQLite (Hybrid)	QMD
Setup	None (built-in)	Config change	Install binary + config
Search types	Semantic only	Semantic + BM25	Semantic + BM25 + LLM re-ranking
Query expansion	No	No	Yes
LLM re-ranking	No	No	Yes
Result diversity (MMR)	No	Yes (optional)	Built into ranking
Temporal decay	No	Yes (optional)	No (use dated file organization)
Embedding provider	Local, OpenAI, Gemini, Voyage	Local, OpenAI, Gemini, Voyage	Local only (GGUF)
API keys needed	Optional (local model works)	Optional (local model works)	None
Disk overhead	~600MB (local model)	~600MB (local model)	~2GB (3 GGUF models)
Privacy	Full (with local embeddings)	Full (with local embeddings)	Full (always local)
Speed	Fast	Fast	Fast (BM25) to moderate (hybrid)
External dir indexing	Via `extraPaths`	Via `extraPaths`	Via `paths[]` with patterns
Fallback on failure	N/A (built-in)	Falls back to vector-only	Falls back to SQLite

My recommendation: If you just set up OpenClaw and have a few daily logs, the default is fine.

If you’ve been running for weeks and noticing gaps in recall, switch to QMD.

If you want a middle ground without installing anything extra, enable hybrid search on SQLite first:

memorySearch: {
  query: {
    hybrid: {
      enabled: true,
      vectorWeight: 0.7,
      textWeight: 0.3,
      candidateMultiplier: 4
    }
  }
}

Prerequisites

OpenClaw running — follow my OpenClaw setup guide if you haven’t
Node.js 22+ or Bun 1.0+
~2GB disk space for GGUF models (auto-downloaded on first run)
macOS or Linux (Windows via WSL2)
On macOS: brew install sqlite for SQLite extension support

Setting up QMD as your OpenClaw memory backend

Install QMD

bun install -g https://github.com/tobi/qmd

Or with npm:

npm install -g @tobilu/qmd

If you installed via Bun, add $HOME/.bun/bin to your PATH:

export PATH="$HOME/.bun/bin:$PATH"

Verify:

qmd --help

Enable the QMD backend

memory: {
  backend: "qmd",
  citations: "auto"
}

citations: "auto" is optional but adds source path and line numbers to results.

What happens on boot

When OpenClaw starts with QMD enabled:

Creates a QMD environment at ~/.openclaw/agents/<agentId>/qmd/
Indexes your workspace memory files (MEMORY.md and memory/**/*.md) plus any configured external paths
Runs qmd update (text indexing) and qmd embed (vector embeddings — slower on first run as models download)
Re-indexes every 5 minutes in the background

The refresh runs asynchronously, so your agent is available for chat right away.

Verify it works

Restart OpenClaw, then ask your agent about something in its memory.

If it returns relevant results with source citations, QMD is working.

You can also verify directly:

STATE_DIR="${OPENCLAW_STATE_DIR:-$HOME/.openclaw}"
export XDG_CONFIG_HOME="$STATE_DIR/agents/main/qmd/xdg-config"
export XDG_CACHE_HOME="$STATE_DIR/agents/main/qmd/xdg-cache"

qmd status
qmd search "test query" -c memory-root

Advanced configuration

Search modes

memory: {
  backend: "qmd",
  qmd: {
    searchMode: "search"  // "search", "vsearch", or "query"
  }
}

search (default) — BM25 keyword search. Fast, usually instant.
vsearch — semantic vector search. Slower, but finds conceptually similar results.
query — full hybrid pipeline with LLM re-ranking. Highest quality, slowest.

Start with the default and switch to query once you’re comfortable with slightly longer response times.

Indexing external Markdown directories

You can point QMD at any Markdown directory — Obsidian vaults, project docs, meeting notes — and search across all of them in a single query.

memory: {
  backend: "qmd",
  qmd: {
    includeDefaultMemory: true,
    paths: [
      { name: "notes", path: "~/notes", pattern: "**/*.md" },
      { name: "obsidian", path: "~/Documents/Obsidian", pattern: "**/*.md" },
      { name: "work-docs", path: "~/work/docs", pattern: "**/*.md" }
    ]
  }
}

Each path gets its own named collection. includeDefaultMemory: true keeps your agent’s own memory files indexed alongside external directories.

Tuning result limits and intervals

memory: {
  backend: "qmd",
  qmd: {
    update: {
      interval: "5m",
      debounceMs: 15000,
      onBoot: true,
      waitForBootSync: false
    },
    limits: {
      maxResults: 6,
      maxSnippetChars: 700,
      timeoutMs: 4000
    }
  }
}

The defaults are sensible.

Only adjust if you’re seeing too many/few results or timeouts.

Pre-warming the index

The first run downloads models and builds embeddings from scratch.

To avoid a slow first interaction, pre-warm manually:

STATE_DIR="${OPENCLAW_STATE_DIR:-$HOME/.openclaw}"
export XDG_CONFIG_HOME="$STATE_DIR/agents/main/qmd/xdg-config"
export XDG_CACHE_HOME="$STATE_DIR/agents/main/qmd/xdg-cache"

qmd update && qmd embed
qmd query "test" -c memory-root --json >/dev/null 2>&1

Enable the automatic memory flush

Easy to miss, but it matters.

OpenClaw has a “pre-compaction ping” — when a session approaches context compaction, it silently prompts the model to write important context to disk before the window resets.

Without it, your agent loses decisions, preferences, and facts that were discussed but never explicitly saved.

Better search is only useful if the memories make it into the files.

agents: {
  defaults: {
    compaction: {
      memoryFlush: {
        enabled: true,
        softThresholdTokens: 4000
      }
    }
  }
}

The agent responds with NO_REPLY so you never see the interaction, but the memories it saves show up in QMD searches later.

Wrapping up

The whole thing takes about five minutes: install QMD, change one config line, restart.

If you haven’t set up OpenClaw yet, start with my guide on automating daily intelligence briefings first.

Once you’re running, come back here and upgrade the memory.