Building a Personal AI Brain: Qdrant, Embeddings, and the Illusion of Memory

Ayra ix · 7 min read · Talk to the AI panel about this article as you read

Vector databases sound magical until you actually use one

thousands

vector points

collections

local
embedding

embed model

768

dimensions

Debugging a self-hosted vector-memory setup on a Saturday afternoon, feeling very clever about the whole thing, it seemed reasonable to ask: "What's the status of my recent project follow-ups?" The answer that came back was a note about needing to pick up eggs, yogurt, and a specific brand of oat milk.

Not a hallucination. Not a bug. A mathematically perfect cosine similarity match. The grocery list used words like "items," "pending," "need to check," and "follow up" — the exact same vocabulary fingerprint as a project status note. The embedding model found patterns. It found its nearest neighbor in vector space. It returned results with full confidence.

It was correct by the only measure it knew. That's the moment the illusion breaks — and where this post starts.

The query "status of recent project follow-ups" landed in the yellow zone. The grocery list was the nearest neighbor — high cosine similarity, zero relevance.

What This Is Actually For

Before getting into what went wrong, here's the reason it exists: context initialization. When a new AI coding session starts, the first thing it does is query a knowledge base for relevant context based on the day's work. That few-second query replaces long stretches of re-explaining project history, decisions made, things tried, things abandoned.

A representative setup: a vector database running self-hosted, a local embedding model generating vectors (runs on local inference — no API calls, no per-query cost), two collections — a knowledge base with thousands of points of session notes, task outcomes, and project summaries, and a smaller research archive. Operating cost: negligible beyond hardware already running.

The collection gets populated automatically. After each session, a background script pulls key facts and decisions from the session transcript, formats them into clean structured notes, generates embeddings, and upserts them into the vector database. No manual curation step. The curation happens upstream — during the session, when decisions and reasoning are stated explicitly. Garbage in, grocery lists out. Good structured notes in, useful memory out.

Session End

transcript ready

→

Extract Script

background job

→

LLM extract

key facts + decisions

→

chunk

~384t / 15% overlap

→

embed

local · 768d

→

vector upsert

+ domain metadata

→

knowledge base

self-hosted

"AI doesn't remember. It retrieves. The quality of retrieval determines the quality of the memory illusion."

At thousands of points, it's genuinely useful. Asking about past architecture decisions returns the actual reasoning from months ago. That's the memory illusion working as intended.

How It Works (The Two-Sentence Version)

An embedding model converts text into a list of numbers — a vector — where similar-meaning text produces similar vectors. Vector search finds the stored vectors nearest to the query vector. That's it. It's a nearest-neighbors problem, not memory. The system doesn't "know" your work tasks any more than a search engine "remembers" your search history — it finds what's mathematically most similar to what you asked.

The Cosine Similarity Trap

Cosine similarity measures the angle between two vectors — how similar their direction is in high-dimensional space. It's excellent at capturing semantic similarity. It will also confidently return a grocery list when you ask about project follow-ups if those two things happen to live near each other in the model's concept space. Similarity is not relevance. This distinction breaks systems that don't account for it.

Similarity Search Result — The Grocery List Incident

Query: "what is the status of my recent project follow-ups"

Rank Document Score Expected

1 grocery-list.md 0.847 ✗ WRONG

2 project-tracker.md 0.831 ✓

3 session-notes.md 0.802 ✓

4 workflow-config.md 0.789 ✓

The Chunk Size Problem Nobody Tells You About

When you index documents for vector search, you split them into chunks first. The size of those chunks is one of the most consequential decisions you'll make — and almost nothing online tells you how to actually choose.

Too small: a 64-token chunk might be a single sentence. "The project is on hold." Nearly meaningless when retrieved — on hold because of what? Which project? Who decided? Too large: a 2048-token chunk contains 15 different topics. The embedding tries to represent all of them and ends up representing none of them well. The retrieval signal dilutes across everything crammed into the chunk.

Chunk Size Comparison — 256t vs 512t with 15% overlap

256-token chunks (15% overlap = ~38 tokens)

chunk 1

chunk 2

chunk 3

chunk 4

4 chunks · tight context · fast retrieval · lower noise per chunk

512-token chunks (15% overlap = ~76 tokens)

chunk 1

chunk 2

2 chunks · richer context · larger candidate pool · more topic dilution risk

content
overlap region (prevents split-boundary loss)

The Goldilocks Range (Tested Numbers)

For factual notes and project summaries: 256–512 tokens with 10% overlap. For conversational content: 512–1024 tokens. The overlap is not optional — it prevents relevant context from being split across chunk boundaries and lost. A knowledge base collection like this one tends to settle around 384 tokens with 15% overlap after testing with real queries. Your content type will produce different numbers. Test with your real data, not synthetic examples. There is no universal answer and anyone who gives you one hasn't tested it.

The Fix: Domain Filtering Before Vector Search

The grocery list incident had a specific root cause: the knowledge base was a single flat namespace. Personal reminders, grocery lists, infrastructure decisions, and project notes all lived in the same vector space and competed on the same similarity metric. A grocery list that used task-tracking language was indistinguishable from a work note at query time.

The fix was adding a domain metadata field to every stored point — values like work, infra, personal, research — and rewriting all queries to filter by domain before running the vector search. Hybrid retrieval: metadata filter first, then vector search inside that filtered subset. The false positive rate dropped to near zero.

Before: Query "status of recent project tasks" against the full collection. Results ranked by cosine similarity across all points. Grocery list wins on vocabulary overlap.

After: Query pre-filtered to domain = "work", then vector search within that subset. Only work-related notes are in the candidate pool. The grocery list isn't even evaluated. The correct result returns in the top 3 every time.

The lesson isn't that vector search is flawed — it's that semantic similarity across unrelated domains is the wrong retrieval problem to solve. Namespace your data. Filter before you search. The vector similarity does its job well when it's operating on a coherent domain, not a soup of everything ever written down.

Before You Build a Vector Memory System

You've defined domain or namespace boundaries for your collections — grocery lists and work tasks must never compete for the same query

You've tested chunk sizes with your actual content type and measured retrieval quality before indexing everything

You've added metadata fields (domain, date, source) so you can filter before vector search, not only after

You understand you're building a search index, not memory — and you've designed your storage format to be specific and structured

What Most Vector DB Tutorials Miss

They show you happy-path retrieval. "Ask about Paris, get the Paris document." Nobody shows you the grocery list problem — where similarity is high but relevance is zero. They don't cover the metadata filter pattern that fixes it. They don't mention that domain-specific vocabulary (industry jargon, internal project names) may not embed well without fine-tuning, since embedding models are trained on general internet text. They don't tell you that a good local embedding model can outperform hosted alternatives on several benchmarks while running for free. Read the paper, not just the tutorial.

The Honest Summary

A self-hosted vector database is excellent software. Run reliably at home, it costs nothing extra and makes every session more productive by eliminating the context re-establishment overhead that silently eats 10-15 minutes of every conversation.

But the grocery list was the most useful thing that happened to this project. It broke a mental model before a production pipeline got built on a false assumption — that vector similarity equals relevance. Add a domain field. Filter before you search. Be specific in what you store. Do those three things and the memory illusion works well enough to be genuinely useful.

Skip them and you'll get grocery lists. Which, in retrospect, is a pretty good outcome for catching a fundamental design flaw early.