If you have built or scoped a retrieval system, an AI search feature, or a recommendation engine, you have run into the question of where to keep the vectors. It is one of those decisions that looks like a minor implementation detail at the start of a project and turns out to shape your costs, your latency, and your operational burden for years. This is a plain-language guide to vector databases: what they actually do, when you need a dedicated one, and how to choose without being swept along by whichever product has the loudest marketing this quarter.
Problem Statement
Traditional databases are built to find things by exact match or by range. Give them an ID, a name, or a date range, and they are superb. Ask them to find the records that are closest in meaning to a piece of text, and they have nothing useful to offer. Yet meaning-based search is exactly what modern AI features depend on. When a retrieval system needs the passages most relevant to a question, or a product search needs results that match intent rather than keywords, it is comparing vectors — long lists of numbers that encode meaning — and looking for the nearest ones.
Finding the nearest vectors among millions, fast enough to sit in a live request, is a genuinely hard computational problem. A vector database is the specialised tool built to solve it. The question every team faces is whether they need that specialised tool, or whether a capability bolted onto their existing database is enough.
Industry Challenges
- Choice overload — There are dozens of options: dedicated databases, extensions to existing ones, and managed cloud services, each claiming to be the fastest or cheapest. The differences that matter are not the ones the benchmarks advertise.
- Benchmark theatre — Vendor benchmarks are run on idealised data and query patterns that rarely match yours. Headline numbers tell you little about how a store behaves on your workload.
- The recall-versus-speed trade-off — Approximate nearest-neighbour search is fast precisely because it is approximate. Push for more speed and you quietly return less relevant results. Teams often do not realise they have made this trade until quality complaints arrive.
- Operational reality — A vector store is not a fire-and-forget component. It needs indexing strategy, memory budgeting, and a plan for keeping data fresh, none of which the getting-started guide mentions.
How Vector Databases Work
At the centre of a vector database is an index built for approximate nearest-neighbour search. Rather than comparing a query against every stored vector, which would be far too slow at scale, the index organises vectors so that the search can jump quickly to the neighbourhood of the answer and examine only a small fraction of the data. The most widely used approach builds a layered graph that the search walks downwards, narrowing in on the closest matches in a handful of hops.
The word approximate is the important one. These indexes trade a small, tunable amount of accuracy for an enormous gain in speed. A parameter controls how hard the search works: turn it up and you find more of the true nearest neighbours at the cost of latency; turn it down and you go faster but miss some. Choosing where to sit on that curve, for your data and your tolerance, is one of the few decisions that genuinely matters.
Alongside the index, a good vector store lets you attach metadata to each vector and filter on it during search — so you can ask for the nearest vectors that also belong to a particular customer, document type, or access level. This combination of similarity and filtering is what most real features need, and it is where the practical differences between products show up.
Choosing the Right Store
Start with the database you already run
For a great many applications, the right answer is the vector capability built into a database you already operate. If you run PostgreSQL, its vector extension handles workloads up to several million vectors comfortably, keeps your vectors alongside your relational data so you can filter and join naturally, and adds no new system to secure, back up, and monitor. We reach for this first, and most projects never need to move beyond it. We made the same argument in our guide to building a retrieval system on business data: prefer the thing you already run.
When a dedicated store earns its place
A purpose-built vector database becomes worthwhile when one of a few specific conditions holds: you are searching tens or hundreds of millions of vectors; you need very high query throughput with tight latency guarantees; or you need advanced features like hybrid search and re-ranking built in rather than assembled yourself. At that scale the specialised indexing, memory management, and horizontal scaling of a dedicated product genuinely pays off.
Managed versus self-hosted
A managed service removes the operational burden of running the database yourself, which is often the right trade for a small team, but it ties your data and your costs to a provider. Self-hosting gives you control and can be cheaper at scale, at the price of needing the expertise to run it well. The honest answer depends on your team more than on the technology.
Implementation Considerations
On performance, the dominant factors are how many vectors you hold, the dimensionality of your embeddings, and where you sit on the recall-speed curve. Smaller embeddings search faster and cost less to store, and for many tasks a smaller embedding model loses very little quality, so this is worth testing before defaulting to the largest one.
On cost, the surprise for most teams is memory. The fastest indexes keep vectors in RAM, and RAM is the expensive resource. A hundred million high-dimensional vectors in memory is a serious bill, which is why techniques that compress vectors, or that keep most of them on disk, matter at scale. Storage of the raw vectors is cheap; serving them quickly is not.
On security, the same rules apply as to any store of potentially sensitive data: encrypt at rest and in transit, control access tightly, and if your vectors are derived from personal or confidential information, treat them with the same care as the source. Filtering by access level at query time is essential when one index serves users with different entitlements.
On scalability, plan for growth in two dimensions — more vectors and more queries — because they are solved differently. More vectors is an indexing and memory problem; more queries is a replication and throughput problem. Knowing which you are about to hit prevents over-engineering for the one you are not.
Real-World Use Cases
- Retrieval for AI assistants — Grounding a model in your documents, the most common reason teams adopt a vector store.
- Semantic and product search — Returning results that match what a user means rather than the exact words they typed.
- Recommendation and similarity — Finding items, articles, or profiles similar to one a user is engaging with.
- Deduplication and clustering — Spotting near-duplicate records or grouping similar content at scale.
Common Mistakes to Avoid
- Reaching for a dedicated database too early — Most projects are well served by the vector capability in their existing database; adopting a separate system adds operational cost for benefits you may never need.
- Trusting vendor benchmarks — Test on your own data and query patterns; the only benchmark that matters is yours.
- Ignoring the recall-speed setting — Shipping with default index parameters and then being surprised by quality issues is common and avoidable.
- Forgetting the freshness problem — A vector index is only as good as how current it is; plan how updates and deletions flow through it.
- Over-sizing embeddings — Defaulting to the largest embedding model inflates cost and memory for accuracy you may not need.
Future Trends
Vector capabilities are steadily becoming a standard feature of general-purpose databases rather than a separate product category, which will make the dedicated-store decision relevant to fewer teams over time. Embedding models are getting smaller and cheaper without losing much quality, easing the memory pressure that drives a lot of vector-store cost. And hybrid approaches that combine meaning-based and keyword search are becoming the default rather than an advanced option, because they consistently produce better results than either alone.
Why Businesses Should Act Now
The features that depend on vector search — grounded AI assistants, genuinely good search, sensible recommendations — are quickly becoming table stakes rather than differentiators. Getting the foundation right early, with a store sized to your actual needs rather than your fears, means you can build these features without inheriting a costly, hard-to-operate system. The wrong choice here is rarely catastrophic, but it is expensive and slow to unwind.
Conclusion
A vector database is the engine behind meaning-based search, and choosing one is less about finding the fastest product than about matching the tool to your scale, your team, and your data. For most businesses, the vector capability in the database they already run is the right place to start, with a dedicated store reserved for genuine scale. The decisions that matter are unglamorous — embedding size, the recall-speed setting, how freshness is handled — and they are exactly where experience saves money. We help teams make these choices and build the features on top of them, and we are happy to talk through where your workload actually sits.
Frequently Asked Questions
Do we need a dedicated vector database?
Often not. If you already run a database with a vector capability, such as PostgreSQL with its vector extension, it comfortably handles workloads up to several million vectors. A dedicated store earns its place at much larger scale or when you need very high throughput and built-in advanced search.
What is approximate nearest-neighbour search?
It is the technique that makes vector search fast. Rather than comparing a query against every stored vector, it uses an index to examine only a small, well-chosen fraction, trading a small, tunable amount of accuracy for a large gain in speed.
Why does memory matter so much for vector databases?
The fastest indexes keep vectors in RAM, and RAM is the costly resource. At tens or hundreds of millions of vectors this becomes the dominant cost, which is why compression and disk-based approaches matter at scale.
How do we keep the index up to date?
Plan for updates and deletions to flow through the index rather than rebuilding it wholesale. Stale results are the most common quality complaint, and incremental updates triggered by data changes are what keep an index trustworthy.
Should we use the largest embedding model?
Usually not by default. Smaller embeddings search faster, cost less to store, and often lose very little quality for a given task. Test a smaller model before assuming you need the largest.
How does vector search handle permissions?
By attaching metadata such as an access level to each vector and filtering on it during search, so the index only returns results the requesting user is entitled to see. This is essential when one index serves users with different permissions.