Building a Personal AI Brain: Qdrant, Embeddings, and the Illusion of Memory
Vector databases sound magical until you actually use one
embedding
Debugging a self-hosted vector-memory setup on a Saturday afternoon, feeling very clever about the whole thing, it seemed reasonable to ask: "What's the status of my recent project follow-ups?" The answer that came back was a note about needing to pick up eggs, yogurt, and a specific brand of oat milk.
Not a hallucination. Not a bug. A mathematically perfect cosine similarity match. The grocery list used words like "items," "pending," "need to check," and "follow up" — the exact same vocabulary fingerprint as a project status note. The embedding model found patterns. It found its nearest neighbor in vector space. It returned results with full confidence.
It was correct by the only measure it knew. That's the moment the illusion breaks — and where this post starts.
What This Is Actually For
Before getting into what went wrong, here's the reason it exists: context initialization. When a new AI coding session starts, the first thing it does is query a knowledge base for relevant context based on the day's work. That few-second query replaces long stretches of re-explaining project history, decisions made, things tried, things abandoned.
A representative setup: a vector database running self-hosted, a local embedding model generating vectors (runs on local inference — no API calls, no per-query cost), two collections — a knowledge base with thousands of points of session notes, task outcomes, and project summaries, and a smaller research archive. Operating cost: negligible beyond hardware already running.
The collection gets populated automatically. After each session, a background script pulls key facts and decisions from the session transcript, formats them into clean structured notes, generates embeddings, and upserts them into the vector database. No manual curation step. The curation happens upstream — during the session, when decisions and reasoning are stated explicitly. Garbage in, grocery lists out. Good structured notes in, useful memory out.
"AI doesn't remember. It retrieves. The quality of retrieval determines the quality of the memory illusion."
At thousands of points, it's genuinely useful. Asking about past architecture decisions returns the actual reasoning from months ago. That's the memory illusion working as intended.
How It Works (The Two-Sentence Version)
An embedding model converts text into a list of numbers — a vector — where similar-meaning text produces similar vectors. Vector search finds the stored vectors nearest to the query vector. That's it. It's a nearest-neighbors problem, not memory. The system doesn't "know" your work tasks any more than a search engine "remembers" your search history — it finds what's mathematically most similar to what you asked.
Cosine similarity measures the angle between two vectors — how similar their direction is in high-dimensional space. It's excellent at capturing semantic similarity. It will also confidently return a grocery list when you ask about project follow-ups if those two things happen to live near each other in the model's concept space. Similarity is not relevance. This distinction breaks systems that don't account for it.
Similarity Search Result — The Grocery List Incident
The Chunk Size Problem Nobody Tells You About
When you index documents for vector search, you split them into chunks first. The size of those chunks is one of the most consequential decisions you'll make — and almost nothing online tells you how to actually choose.
Too small: a 64-token chunk might be a single sentence. "The project is on hold." Nearly meaningless when retrieved — on hold because of what? Which project? Who decided? Too large: a 2048-token chunk contains 15 different topics. The embedding tries to represent all of them and ends up representing none of them well. The retrieval signal dilutes across everything crammed into the chunk.
For factual notes and project summaries: 256–512 tokens with 10% overlap. For conversational content: 512–1024 tokens. The overlap is not optional — it prevents relevant context from being split across chunk boundaries and lost. A knowledge base collection like this one tends to settle around 384 tokens with 15% overlap after testing with real queries. Your content type will produce different numbers. Test with your real data, not synthetic examples. There is no universal answer and anyone who gives you one hasn't tested it.
The Fix: Domain Filtering Before Vector Search
The grocery list incident had a specific root cause: the knowledge base was a single flat namespace. Personal reminders, grocery lists, infrastructure decisions, and project notes all lived in the same vector space and competed on the same similarity metric. A grocery list that used task-tracking language was indistinguishable from a work note at query time.
The fix was adding a domain metadata field to every stored point — values like work, infra, personal, research — and rewriting all queries to filter by domain before running the vector search. Hybrid retrieval: metadata filter first, then vector search inside that filtered subset. The false positive rate dropped to near zero.
Before: Query "status of recent project tasks" against the full collection. Results ranked by cosine similarity across all points. Grocery list wins on vocabulary overlap.
After: Query pre-filtered to domain = "work", then vector search within that subset. Only work-related notes are in the candidate pool. The grocery list isn't even evaluated. The correct result returns in the top 3 every time.
The lesson isn't that vector search is flawed — it's that semantic similarity across unrelated domains is the wrong retrieval problem to solve. Namespace your data. Filter before you search. The vector similarity does its job well when it's operating on a coherent domain, not a soup of everything ever written down.
They show you happy-path retrieval. "Ask about Paris, get the Paris document." Nobody shows you the grocery list problem — where similarity is high but relevance is zero. They don't cover the metadata filter pattern that fixes it. They don't mention that domain-specific vocabulary (industry jargon, internal project names) may not embed well without fine-tuning, since embedding models are trained on general internet text. They don't tell you that a good local embedding model can outperform hosted alternatives on several benchmarks while running for free. Read the paper, not just the tutorial.
The Honest Summary
A self-hosted vector database is excellent software. Run reliably at home, it costs nothing extra and makes every session more productive by eliminating the context re-establishment overhead that silently eats 10-15 minutes of every conversation.
But the grocery list was the most useful thing that happened to this project. It broke a mental model before a production pipeline got built on a false assumption — that vector similarity equals relevance. Add a domain field. Filter before you search. Be specific in what you store. Do those three things and the memory illusion works well enough to be genuinely useful.
Skip them and you'll get grocery lists. Which, in retrospect, is a pretty good outcome for catching a fundamental design flaw early.