memweave: Zero-Infra AI Agent Memory with Markdown and SQLite — No Vector Database Required

Contents

Disclosure: memweave is an open-source project I built. This article describes the problem it addresses and the design decisions behind it.

Picture this: you spend an afternoon building an AI coding assistant. It learns your project’s conventions, remembers that you use Valkey instead of Redis, and knows your team’s preferred testing patterns. The session ends. You open a new conversation the next morning, and you have forgotten everything. Back to square one.

This is the default state of every LLM agent. Models are stateless by design. Each call starts with a blank slate. Memory is your problem to solve.

The most common workaround is to stuff the entire conversation history into the context window. It works — until it doesn’t. Context windows are finite and expensive. A long-running agent accumulates thousands of tokens of history, most of which are irrelevant to the current question. You end up paying to repeatedly feed your agent last week’s debugging notes when all it needs is one architecture decision from three months ago.

So you reach for a vector database. Spin up Chroma, or provision a Pinecone index, embed everything, and query by semantic similarity. This works too — but it introduces a new class of problems:

Opacity. Your agent’s memory lives in a binary index you cannot open, read, or reason about. What does your agent actually know? You can only find out by querying it.
No version control. There is no git diff for a vector store. You cannot see what an agent learned between runs, audit its knowledge, or roll back a bad memory.
Infrastructure overhead. Even for a single local agent, you now have a server process to manage, credentials to configure, and a service to keep running.
Stale memory, no remedy. A vector DB ranks results by semantic similarity, full stop. A debugging note from six months ago competes on equal footing with a decision made this morning. Older, stale context surfaces confidently alongside fresh knowledge — and there is no built-in mechanism to prefer the recent over the outdated.
Invisible edits. If you want to correct a memory — fix a wrong assumption the agent stored — you need to delete and re-embed. You cannot just open the file and change a line.

The deeper issue is that none of these tools were designed for agent memory. They were designed for document retrieval at scale. Using them for a personal or project-scoped agent is like deploying a PostgreSQL cluster to store a config file.

There is a simpler way.

The Approach: Markdown + SQLite

Photo by Soragrit Wongsa on Unsplash

The core idea behind memweave is deliberately simple: memories are .md files you write to disk. memweave indexes them into a local SQLite database and lets you search across them with hybrid BM25 + semantic vector search. The database is always a derived cache — if you delete it, memweave rebuilds it from the files. The files are the source of truth.

pip install memweave

Here is everything you need to give an agent persistent memory:

import asyncio
from pathlib import Path
from memweave import MemWeave, MemoryConfig

async def main():
    async with MemWeave(MemoryConfig(workspace_dir=".")) as mem:
        # Write a memory - just a plain Markdown file
        memory_file = Path("memory/stack.md")
        memory_file.parent.mkdir(exist_ok=True)
        memory_file.write_text("We use Valkey instead of Redis. Target latency SLA: 5ms p99.")
        await mem.add(memory_file)

        # Search across all memories.
        # min_score=0.0 ensures results surface in a small corpus;
        # in production the default 0.35 threshold filters low-confidence matches.
        results = await mem.search("caching layer decision", , min_score=0.0)
        for r in results:
            print(f"[{r.score:.2f}] {r.snippet}  ← {r.path}:{r.start_line}")

asyncio.run(main())

Output:

[0.34] We use Valkey instead of Redis. Target latency SLA: 5ms p99.  ← memory/stack.md:1

Every result includes its relevance score, the exact file it came from, and the line number—full source provenance out of the box. No post-processing needed to trace where an answer originated.

And because memories are just files, you can inspect them with any tool you already have:

cat memory/stack.md
grep -r "Valkey" memory/
git diff memory/

That last command — git diff memory/ — is the one that changes how you think about agent memory. Every fact your agent stores is a line in a file. Every session is a commit. What your agent learned is as auditable as any other change in your codebase.

Why Files and SQLite Instead of a Vector Database

Vector databases were designed for large-scale document retrieval — millions of documents, multi-tenant services, and production search infrastructure. They are excellent at that job. Agent memory is a different job entirely: hundreds of thousands of files, personal or project-scoped, where the knowledge is as important as the code itself. These constraints pushed me toward a different set of tradeoffs:

memweave vs Vector Databases (image by author)

Each of these differences compounds in practice, but version control illustrates the gap most concretely. Consider what happens when your agent stores a wrong assumption — say, it learned that your team uses PostgreSQL when you actually migrated to CockroachDB last quarter. With a vector DB, correcting this means finding the right embedding, deleting it, and re-inserting the corrected version via API. With memweave, you open the file and fix the line. Then you commit it.

# git diff memory/stack.md

- Database: PostgreSQL (primary), Redis (cache)
+ Database: CockroachDB (primary, migrated Q1 2026), Valkey (cache)
+ Reason: geo-distribution requirement from the platform team

That diff is now part of your project history. Any teammate — or any future agent — can see what changed, when, and why. This is the operational model that memweave is built around: agent memory as a first-class artifact of your project, not a side-effect stored in a service you can’t inspect.

Architecture

memweave is built around one central idea: separate storage from search. The Markdown files are the source of truth. The SQLite database is a derived index — always rebuildable, never irreplaceable.

┌──────────────────────────────────────────────────────────────┐
│                 SOURCE OF TRUTH  (Markdown files)            │
│   memory/MEMORY.md          ← evergreen knowledge            │
│   memory/2026-03-21.md      ← daily logs                     │
│   memory/researcher_agent/  ← agent-scoped namespace         │
└───────────────────────┬──────────────────────────────────────┘
                        │  chunking → hashing → embedding
┌───────────────────────▼──────────────────────────────────────┐
│                  DERIVED INDEX  (SQLite)                     │
│   chunks          - text + metadata                          │
│   chunks_fts      - FTS5 full-text index  (BM25)             │
│   chunks_vec      - sqlite-vec SIMD index (cosine)           │
│   embedding_cache - hash → vector  (compute once, reuse)     │
│   files           - SHA-256 change detection                 │
└───────────────────────┬──────────────────────────────────────┘
                        │  hybrid merge → post-processing
                        ▼
              list[SearchResult]

This separation has a practical consequence that is easy to overlook: losing the database is not data loss. Losing the files is. If the SQLite index is deleted or corrupted, await mem.index() rebuilds it completely from the Markdown files in the workspace. No data is gone. No embeddings need to be re-fetched if the cache is intact.

The Write Path

When you call await mem.add(path) or await mem.index(), memweave processes each file through a deterministic pipeline — no LLM involved at any step:

.md file
    │
    ▼
chunking                  - split into overlapping text chunks
    │
    ▼
sha256(chunk_text)        - fingerprint each chunk by content
    │
    ▼
embedding cache lookup    - bulk SQL query: which hashes are already cached?
    │
    ├── cache hit  ──────── reuse stored vector, skip API call
    │
    └── cache miss ──────── call embedding API (batched)
                │
                ▼
         store in cache   - write vector to embedding_cache table
                │
                ▼
    insert into FTS5 + sqlite-vec tables

The SHA-256 hash is the key efficiency lever. A chunk’s hash is determined entirely by its text content — so if a file is re-indexed and 90% of its chunks are unchanged, only the changed chunks trigger an API call. The rest are served from cache instantly.

The Search Path

When you call await mem.search(query), both search backends run in parallel against the same query and their results are merged before post-processing:

query
    │
    ├─── FTS5 BM25 (keyword) ─────────────────────┐
    │    exact term matching                      │
    │                                             ▼
    └─── sqlite-vec ANN (semantic) ──────► weighted merge
         cosine similarity                score = 0.7 × vector
                                               + 0.3 × BM25
                                                   │
                                                   ▼
                                          post-processing pipeline
                                          (threshold → decay → MMR)
                                                   │
                                                   ▼
                                          list[SearchResult]

Running both backends in parallel matters: BM25 catches exact matches — error codes, config values, proper names — while vector search catches semantically related content even when no keywords overlap. Together they cover the full range of how an agent’s memory is likely to be queried. The post-processing pipeline that follows the merge is covered in detail in next sections.

Why SQLite as the Infrastructure Layer?

The choice of SQLite deserves a brief note. SQLite is not a compromise — it is a deliberate fit for this use case. It ships with Python, requires no server, supports full-text search via FTS5, and with the sqlite-vec extension gains SIMD-accelerated vector similarity search. The entire memory store — chunks, embeddings, cache, file metadata — is a single file on disk that you can copy, back up, or inspect with any SQLite browser. For the scale of agent memory (thousands of files), it is not just sufficient — it is optimal.

How memweave Organises Memory: Evergreen Files, Dated Logs, and Agent Namespaces

Not all knowledge ages equally. A team’s decision to use CockroachDB over PostgreSQL is as relevant today as the day it was made. A debugging note from a session six months ago probably isn’t. memweave enforces this distinction at the file level — no metadata tagging, no configuration, just a naming convention.

There are two types of memory files:

Types of memory files in memweave (image by author)

The rule is simple: any file whose name matches <strong>YYYY-MM-DD.md</strong> is dated. Everything else is evergreen. memweave reads the date directly from the filename — no file system metadata, no frontmatter parsing, no manual tagging.

A typical workspace organises itself naturally around this convention:

memory/
├── MEMORY.md                  ← evergreen - permanent facts, always surfaces
├── architecture.md            ← evergreen - stack decisions, constraints
├── 2026-01-15.md              ← dated - session notes from January
├── 2026-03-10.md              ← dated - session notes from March
├── 2026-04-11.md              ← dated - today's session, full score for now
└── researcher_agent/
    ├── findings.md            ← evergreen - agent's standing knowledge
    └── 2026-04-11.md          ← dated - agent's session log, will decay

Over time, the dated files accumulate and fade. The evergreen files remain anchored at full score regardless of how much history builds up around them. An agent asking about the tech stack always gets architecture.md at the top of its results — even if hundreds of session logs have been written since.

Agent Namespaces (enables Multi-Agent Memory)

When multiple agents share one workspace, you need a way to keep their knowledge isolated without spinning up separate databases. memweave handles this through subdirectories. The immediate subdirectory under memory/ becomes the source label for every file inside it:

memweave agent namespaces examples (image by author)

Each agent writes to its own subdirectory. All agents index against the same SQLite database. Searches are global by default — any agent can read any other agent’s memories. Pass source_filter to scope a search exclusively to one namespace:

# Researcher writes to its own namespace
researcher = MemWeave(MemoryConfig(workspace_dir="./project"))
writer     = MemWeave(MemoryConfig(workspace_dir="./project"))

async with researcher, writer:
    # Researcher indexes its findings under memory/researcher_agent/
    await researcher.index()

    # Writer queries only the researcher's namespace
    results = await writer.search(
        "water ice on the Moon",
        source_filter="researcher_agent",
    )

This pattern scales naturally to any number of agents. Each agent’s knowledge is isolated by path convention, inspectable as a folder, and versionable independently — git log memory/researcher_agent/ shows exactly what that agent learned and when.

memweave Search Pipeline

Every mem.search(query) call moves through five fixed stages in order. Each stage is independent, composable, and tunable. Here is the full pipeline, then each stage in detail.

Stage 1 — Hybrid Score Merge

Both backends run in parallel against the same query and their scores are normalised then linearly combined:

merged_score = α × vector_score + (1 − α) × bm25_score

Default α = 0.7. Each backend contributes what it does best:

FTS5 BM25 ranks by term frequency and inverse document frequency. It is a precision anchor — exact technical terms, error codes, config values, and proper names score high. If your query and your document use the same words, BM25 finds it.
sqlite-vec cosine similarity measures distance in embedding space. It catches semantically related content even when no keywords overlap — a query for “caching layer” will surface a chunk mentioning “Redis latency” because the embeddings are close, even though the words differ.

The 70/30 split reflects the nature of most agent memory queries: conceptual and paraphrased more often than exact-string lookups. Tune the weights via HybridConfig , if your use case skews toward precise technical retrieval:

from memweave.config import MemoryConfig, QueryConfig, HybridConfig

config = MemoryConfig(
    query=QueryConfig(
        hybrid=HybridConfig(
            vector_weight=0.5,   # equal weight for keyword-heavy corpora
            text_weight=0.5,
        )
    )
)

Stage 2 — Score Threshold

drop result if merged_score < min_score   (default: 0.35)

A noise gate that runs before the more expensive post-processing stages. Without it, low-confidence tail results enter MMR and decay calculations and waste compute. The default of 0.35 is calibrated for typical agent memory corpora — lower it for small workspaces where you want more results to surface, raise it when precision matters more than recall.

# Override per call - no config change needed
results = await mem.search("architecture decision", min_score=0.5)

Stage 3 — Temporal Decay (opt-in)

Agents accumulate knowledge over time, but not all knowledge ages equally. Without decay, a stale debugging note from six months ago can outrank a decision made this morning simply because it embeds well. Temporal decay solves this by multiplying each result’s score by an exponential factor based on the age of its source file.
The formula is standard exponential decay:

λ             = ln(2) / half_life_days
multiplier    = exp(−λ × age_days)
decayed_score = original_score × multiplier

At age_days = 0 the multiplier is 1.0 — no change. At age_days = half_life_days it is exactly 0.5. The curve is smooth and continuous: scores are never zeroed, old memories still surface, they simply rank lower than recent ones.

Evergreen files bypass this stage entirely — their multiplier is always 1.0 regardless of when they were written.

from memweave.config import MemoryConfig, QueryConfig, TemporalDecayConfig

config = MemoryConfig(
    query=QueryConfig(
        temporal_decay=TemporalDecayConfig(
            enabled=True,
            half_life_days=30.0,  # tune to your workflow
        )
    )
)

Tune half_life_days to your workflow: 7 for fast-moving projects where week-old context is already stale, 90 for research or documentation repositories where knowledge stays relevant for months.

Stage 4 — MMR Re-ranking (opt-in)

Without diversity control, the top results from a hybrid search are often near-duplicates — multiple chunks from the same file, or different phrasings of the same fact. An agent loading all of them into its context window wastes tokens and misses other relevant but distinct memories.

MMR (Maximal Marginal Relevance) reorders results after scoring to balance relevance against diversity. At each selection step it picks the candidate that maximises:

MMR(cᵢ) = λ × relevance(cᵢ) − (1 − λ) × max sim(cᵢ, cⱼ)  for cⱼ ∈ S

Where:
S = set of already-selected results
relevance(cᵢ) = merged score from Stage 1, after temporal decay
sim(cᵢ, cⱼ) = Jaccard token overlap between candidate and each selected result
λ = diversity dial — 0 is pure diversity, 1 is pure relevance, default 0.7

Why Jaccard overlap rather than cosine similarity?

Two chunks that share many of the same words — even from different files — are genuinely redundant for an agent loading them as context. Jaccard catches this at the token level without requiring an additional embedding call per pair.

┌──────────────┬─────────────────────────────────────────────────────────┐
│ lambda_param │ Behaviour                                               │
├──────────────┼─────────────────────────────────────────────────────────┤
│ 1.0          │ Pure relevance — identical to no MMR                    │
│ 0.7          │ Default — strong relevance, light diversity push        │
│ 0.5          │ Equal weight between relevance and diversity            │
│ 0.0          │ Pure diversity — maximally novel results                │
└──────────────┴─────────────────────────────────────────────────────────┘

from memweave.config import MemoryConfig, QueryConfig, MMRConfig

config = MemoryConfig(
    query=QueryConfig(
        mmr=MMRConfig(enabled=True, lambda_param=0.7)
    )
)

# Or override λ per call without touching the config
diverse_results = await mem.search("deployment steps", mmr_lambda=0.3)

Stage 5 — Custom Post-processors

Any processors registered via mem.register_postprocessor() run last, in registration order. Each receives the output of the previous stage and can filter, reorder, or rescore freely — domain-specific boosting, hard pinning a result to the top, or integrating an external signal. The built-in pipeline runs first; custom stages extend it without replacing it.

Real-World Example using memweave — Book Club Decision Log

The best way to see memweave in action is to watch two agents answer the same question with different retrieval strategies. The full runnable notebook is available at <a href="https://github.com/sachinsharma9780/memweave/blob/main/examples/book_club_demo.ipynb" rel="noreferrer noopener" target="_blank">examples/book_club_demo.ipynb</a>.

The Setup

The workspace contains 9 memory files spanning 18 months of a book club’s history:

One evergreen file holds standing information that should always surface at full score. Seven dated files accumulate the club’s history. One file written today holds the current state.

The Question

Both agents are asked the same question:

“What genre did the club vote on most recently?”

The correct answer — grounded in the most recent information — is science fiction, with literary fiction likely next. But an agent without temporal awareness will not necessarily find this.

Agent A — No Temporal Decay

config = MemoryConfig(
    workspace_dir=WORKSPACE,
    embedding=EmbeddingConfig(model="text-embedding-3-small"),
)

async with MemWeave(config) as mem:
    results = await mem.search(
        "What genre did the club vote on most recently?",
        max_results=3,
        min_score=0.1,
    )

Agent A’s top 3 results by raw semantic similarity:

[0.339]  2025-11-03.md   ← Non-fiction vote (5 months ago)
[0.336]  2024-10-05.md   ← Fantasy vote (18 months ago)
[0.320]  2025-05-10.md   ← Mystery vote (11 months ago)

Today’s file does not appear in the top 3. The older “vote” files outscore it on raw semantic similarity because they contain more explicit voting language. Agent A’s answer:

“The club most recently voted on the genre of non-fiction.”

Factually stale — the November 2025 vote, not the most recent one.

Agent B — With Temporal Decay (`half_life = 90 days`)

async with MemWeave(config) as mem:
    results = await mem.search(
        "What genre did the club vote on most recently?",
        max_results=3,
        min_score=0.1,
        decay_half_life_days=90.0,
    )

Agent B’s top 3 results after the age penalty:

[0.313]  2026-04-11.md   ← Today's notes (multiplier: 1.00) ↑ rank 1
[0.293]  club_info.md    ← Evergreen     (multiplier: 1.00)
[0.128]  2025-12-30.md   ← Sci-fi plan   (multiplier: ~0.46)

Today’s file floats to rank 1 after the age penalty collapses the scores of older files. The end-of-year review retains ~46% of its score; the November 2025 non-fiction vote drops out of the top 3 entirely.

Agent B’s answer, grounded in today’s file:

“The club most recently voted for science fiction.”

What This Demonstrates

The stale memory problem is real and silent. Agent A does not know it is wrong. It returns a confident answer based on the highest-scoring semantic matches — which happen to be older files with more explicit voting language. There is no error, no warning, just subtly outdated context.
Decay’s advantage compounds with history. With 18 months of files, Agent A’s context fills with increasingly stale votes. The larger the memory grows, the worse the problem becomes — and the more dramatic the difference between the two agents.
club_info.md (evergreen) surfaces in Agent B at full score. With decay enabled, the age penalty clears out stale vote records, and the evergreen standing info rises into the top 3 — despite never being the closest semantic match to the query. In Agent A, older dated files with explicit voting language outscore it on raw similarity. Evergreen immunity is determined by the file path, not the content.
A single parameter change is all it takes. decay_half_life_days=90.0 is the only difference between Agent A and Agent B. No schema changes, no re-indexing, no metadata tagging.

Summary

Agent memory does not have to mean infrastructure. memweave takes a different bet: memories are plain Markdown files you can open, edit, and git diff. A local SQLite database indexes them for hybrid search — BM25 for exact matches, vector search for semantic retrieval, merged into a single ranked list. Temporal decay keeps recent context above stale history automatically. MMR ensures the top results cover different aspects of your query rather than repeating the same fact. An embedding cache means only changed content ever hits the API. The entire store is a single file on disk — no server, no Docker, no cloud service.

The book club demo makes the tradeoff concrete: two agents, one question, one parameter difference, two different answers. The agent with temporal decay surfaces today’s file at rank one. The agent, without it, surfaces a five-month-old vote with more explicit “voting” language — and confidently gives the wrong answer without knowing it.

The broader point is that the stale-memory problem is silent. There is no error, no warning — just subtly outdated context fed to the model. The larger the memory grows, the more stale files accumulate, and the more aggressively they compete with recent ones on raw semantic similarity. Temporal decay is the only mechanism that keeps the retrieval honest as history builds up.

Get Started

pip install memweave

Book club demo: <a href="https://github.com/sachinsharma9780/memweave/blob/main/examples/book_club_demo.ipynb" target="_blank" rel="noreferrer noopener">examples/book_club_demo.ipynb</a>
Meeting notes agent: <a href="https://github.com/sachinsharma9780/memweave/blob/main/examples/meeting_notes_agent.ipynb" target="_blank" rel="noreferrer noopener">examples/meeting_notes_agent.ipynb</a>
GitHub: github.com/sachinsharma9780/memweave
PyPI: pypi.org/project/memweave

If you hit something unexpected, find a use case the library doesn’t cover well, or just want to share what you built — open an issue or start a discussion on GitHub. The feedback will be really appreciated.