Index / Notes / Comparison
Vector Store Choices in 2026: Qdrant vs Pinecone vs pgvector vs Weaviate vs Milvus
Choosing a vector store in 2026 is no longer a tooling question — it's a procurement decision. Here are the five stores worth shortlisting, where each earns its keep, and what to ask before you sign.
- Qdrant, Pinecone, pgvector, Weaviate, and Milvus cover the credible 2026 shortlist.
- Pick by use case: Postgres-native (pgvector), zero-ops (Pinecone), open-source on-prem (Qdrant or Milvus), hybrid retrieval (Weaviate).
- Latency at p99 above 100ms means you've outgrown the store, not the index.
- Pricing posture matters more than per-vector cost — watch for read/write asymmetry.
A vector store is a database purpose-built to store, index, and query high-dimensional vectors — the numerical representations that embedding models produce from text, images, or structured data — so that similarity search can return semantically relevant results in milliseconds rather than seconds.
The category exists because traditional databases were designed around exact-match and range queries, not approximate nearest-neighbor (ANN) search. A SQL WHERE clause is the wrong tool when the question is "find the 20 documents whose embeddings are closest to this query embedding in 1536-dimensional space." Every vector store on this list provides some variant of HNSW (Hierarchical Navigable Small World) or IVF-based indexing to answer that question fast. The differences are in how they handle hybrid retrieval, how they scale, how they price, and how much you want to own the infrastructure.
The five vendors below are the credible shortlist for production deployments in 2026. That doesn't mean one of them is right for every team — it means that choosing outside this list requires a specific reason.
What does a vector store actually do?
Beyond ANN search, a vector store provides a set of capabilities that separate "works in a notebook" from "works in production." The table below names the five functional areas every production deployment will eventually care about.
| Capability | What it means | Why it matters |
|---|---|---|
| ANN index | HNSW, IVF, or hybrid index structure for fast approximate nearest-neighbor search | Base performance — query latency and throughput ceiling |
| Hybrid retrieval | Combined vector similarity + keyword (BM25/full-text) scoring in a single query | Most production RAG systems benefit from sparse + dense signals; pure vector search undersells on keyword-heavy corpora |
| Filtering | Pre-filter or post-filter results by metadata fields before or after ANN search | Required once you have multi-tenant or permission-scoped data |
| Persistence and replication | Data survives restarts; optional read replicas for query throughput | Determines whether the store is prod-safe without external backups |
| On-prem / self-hosted option | Ability to run within your own VPC or data center | Compliance, data residency, and cost control at the high end |
These aren't nice-to-haves. A store that lacks metadata filtering forces application-layer filtering over full result sets — that's a performance cliff at any serious cardinality. A store that lacks a self-hosted option may be off-limits if your data can't leave your environment.
How do the five compare on features?
The matrix below rates each vendor across the five features that most often determine the shortlist. Ratings are relative to the 2026 baseline, not absolute; "good" means competitive with the best in class.
| Vendor | Hybrid retrieval | Scale ceiling | Deployment model | Pricing posture | On-prem support |
|---|---|---|---|---|---|
| Qdrant | Good (sparse + dense, native) | High (distributed mode, sharding) | Self-hosted or managed cloud | Free self-hosted; cloud metered | Yes — Docker, Kubernetes, binary |
| Pinecone | Good (sparse-dense hybrid) | Very high (serverless handles bursty scale) | Managed cloud only | Serverless (per-read/write unit) + pod-based | No |
| pgvector | BYO (compose with Postgres FTS) | Medium (Postgres-bound) | Anywhere Postgres runs | Open source; pay for your Postgres | Yes — runs wherever Postgres does |
| Weaviate | Excellent (BM25 + vector, ranked fusion built in) | High (horizontal sharding) | Self-hosted or managed cloud | Free self-hosted; cloud metered | Yes — Docker, Kubernetes |
| Milvus | Good (sparse + dense in Milvus 2.x) | Very high (designed for billion-scale) | Self-hosted or managed (Zilliz Cloud) | Open source; Zilliz Cloud pay-as-you-go | Yes — purpose-built for self-host |
A few notes behind the cells.
Qdrant (qdrant.tech) shipped sparse vector support as a first-class feature, which means hybrid retrieval doesn't require a separate index or a second pass. The payload filtering system is composable and fast — it's one of Qdrant's design strengths. The cloud offering (Qdrant Cloud) runs on AWS and GCP. On GitHub, Qdrant is currently the most-starred pure-vector-database repository (37k+ stars as of early 2026), which is a reasonable proxy for community momentum.
Pinecone (pinecone.io) is the managed-only choice. You don't run infrastructure; you send vectors and get results. The serverless tier (GA since 2024) handles variable workloads well and removed the previous penalty for low-traffic indexes. The sparse-dense hybrid (Pinecone's "sparse-dense vectors") is competitive. The tradeoff is that you're fully on Pinecone's platform — no on-prem path, no portable data without export.
pgvector (github.com/pgvector/pgvector) is the Postgres extension that adds vector as a column type plus HNSW and IVF-Flat index support. If your application already has a Postgres database, pgvector is the cheapest possible path to vector search — no new service, no new ops surface, same connection pool. The ceiling is Postgres's ceiling, which is meaningful for very large corpora or high-QPS workloads. Hybrid retrieval is composed manually: vector search via pgvector, keyword search via tsvector/pg_search, ranked and merged in application code.
Weaviate (weaviate.io) has invested heavily in hybrid search, and it shows. The BM25 + vector ranked fusion (using Reciprocal Rank Fusion by default) is the most polished in-box hybrid experience among the five. Weaviate also ships a built-in generative-search module that can call an LLM for each result — useful if you're building Q&A pipelines without a separate orchestration layer. The schema model is more opinionated than Qdrant or Milvus, which slows initial setup but provides structure that pays off in multi-tenant deployments.
Milvus (milvus.io) was built from the start for billion-scale deployments. The architecture is disaggregated — separate nodes for query, data, index, and coordination — which allows independent horizontal scaling of each layer. This is overkill for most teams but correct for the teams that need it. The managed offering, Zilliz Cloud (zilliz.com), adds the operational layer on top of the open-source core. Milvus 2.x ships sparse vector support (for hybrid), but the hybrid experience is less plug-and-play than Weaviate's.
Which vector store fits which use case?
The right choice follows directly from your constraints. Work through the decision points in order.
Start at infrastructure posture. If you cannot send data to an external managed service — compliance, data residency, contractual — your shortlist is Qdrant, Weaviate, pgvector, or Milvus. Pinecone is out. If managed-cloud is acceptable, all five are on the table.
Then check whether you already run Postgres. If you do, and your corpus is under roughly 10 million vectors with modest QPS requirements (under 500 concurrent queries per second in sustained load), pgvector is the right starting point. You avoid a new service, a new ops surface, and a new pricing contract. The limitation is real — pgvector's ANN performance degrades at very high cardinality compared to purpose-built stores — but most teams don't hit it.
If you need zero operational overhead, Pinecone is the correct choice. The serverless tier means you pay for what you use, startup time is minutes, and you never page on an index. The managed cloud story for Qdrant and Weaviate is also viable, but Pinecone's managed experience is the most polished.
If hybrid retrieval is a first-class requirement, Weaviate is the leading choice. Keyword recall matters on corpora with technical identifiers, short strings, or proper nouns that don't embed reliably. BM25 + vector fusion in Weaviate is in-box; on the other stores it requires more assembly.
If you're building at billion-vector scale, Milvus is the purpose-built answer. The disaggregated architecture allows query and storage scaling to diverge, which is not possible in the others at that scale.
If you want the best balance of open-source maturity, hybrid retrieval, and community momentum, Qdrant is the current default choice for new projects that don't hit the specific conditions above. The community is large, the documentation is current, the filtering system is fast, and the cloud offering is competitive. Diagest's data pipeline produces Qdrant-compatible vector outputs, which reflects the same reasoning applied at the pipeline layer.
What about cost at scale?
Pricing posture matters more than the published per-vector number. Three pricing models are in play across this field, and they have different failure modes at scale.
Per-read/write unit (Pinecone serverless) charges for operations. Reads and writes are priced separately. Write-heavy workloads — a live ingestion pipeline pushing new documents continuously — run up write costs quickly. Read-heavy workloads — a Q&A system with millions of daily queries — run up read costs. Pinecone's pricing page (pinecone.io/pricing) publishes current unit rates; at the time of writing, serverless charges per-read-unit and per-write-unit with a free tier. The predictability risk is that costs scale linearly with usage, which is either comfortable (you can model it) or alarming (you can't predict traffic).
Per-pod / reserved compute (Pinecone pods, Zilliz Cloud, Weaviate Cloud) charges for allocated capacity, not for individual operations. You pay for a node or cluster regardless of whether it's idle. This is cheaper per-query under sustained high load but more expensive at low utilization. Pod pricing converts variable operational cost into a fixed infrastructure cost — which finance teams often prefer for budget predictability, but which hurts early-stage projects where traffic is uncertain. Pinecone's pod pricing is documented alongside serverless; Zilliz Cloud (zilliz.com/pricing) and Weaviate Cloud (weaviate.io/pricing) publish their cluster tier rates.
Open source / self-hosted (Qdrant, Milvus, pgvector, Weaviate) has zero licensing cost. You pay for the compute you run — EC2, GKE nodes, RDS Postgres. This is cheapest at high utilization, most operationally demanding, and requires someone who knows the index tuning dials. The real cost of self-hosted is engineering time: index rebuilds, shard rebalancing, node failure handling, version upgrades. For teams with strong infrastructure muscle, this is cheaper than managed at any reasonable scale. For teams without it, the hidden operational costs often exceed the managed-service fee.
A specific cost pattern to watch: read/write asymmetry in managed pricing. If your workload is 90% reads (a finished corpus, queries against it), per-read-unit pricing can be very expensive. If your workload is continuous write-heavy (live document ingestion, frequent re-embeds), per-write-unit pricing bites. Before you commit to a managed tier, model your actual read/write ratio against the vendor's published rates. Most teams skip this step and then find the bill in month three.
At Qdrant Cloud's published rates (cloud.qdrant.io), clusters are priced by node size and region. For a medium-scale deployment (10M vectors, 1536 dimensions, p99 latency under 50ms), a single-node cluster in the starter tier is under $100/month; a production-grade multi-node cluster is in the $300–600/month range depending on region and replicas. These numbers change — check the current pricing page before planning a budget.
The short version: model your workload before you pick a pricing model, not after. Per-unit pricing is comfortable for low, bursty, or unpredictable traffic. Per-pod is comfortable for steady, predictable, high-utilization workloads. Self-hosted is correct when you have the infrastructure team and the scale to justify it.
What should I be asking a vendor I'm evaluating?
Standard demos show the happy path. These five questions probe the edges.
1. What is your ANN recall at my dataset size and my target p99 latency? Published benchmarks are run on ANN Benchmarks datasets (SIFT1M, GIST1M) that are cleaner and more uniform than most production corpora. Ask the vendor to run their stack against a sample of your actual data, with your actual vector dimensions, at your target query volume. The HNSW ef_search parameter trades recall for latency — the vendor should be able to tell you where that dial needs to sit for your workload.
2. How does metadata filtering interact with ANN performance at scale? Pre-filtering (filter before ANN) and post-filtering (filter after ANN) have different recall characteristics. Pre-filtering on a low-selectivity filter (e.g., "tenant_id = X" where X has 0.1% of vectors) can produce very few candidates before ANN runs, which tanks recall. Most stores offer both modes; the right choice depends on your filter cardinality. Ask the vendor to show you the degradation curve.
3. What does an index migration look like? At some point you will change your embedding model or your schema. The migration path — re-embedding the corpus and reindexing under a new collection — is either a documented, zero-downtime procedure or it isn't. Ask how large customers have handled model upgrades. If the answer is "delete and recreate," you need a dual-write strategy in your application layer before you commit.
4. What is the failure mode when the index node goes down? For managed stores, ask about SLA and recovery time objective. For self-hosted, ask what happens to in-flight queries and what the replication lag looks like during node failure. Qdrant, Milvus, and Weaviate all have distributed modes with replica shards; the failover behavior is documented in their respective architecture guides.
5. What does the data export look like? If you switch vendors in eighteen months — because the pricing changes, because the product gets acquired, because your scale requires something different — how do you get your vectors out? Qdrant, Milvus, and Weaviate all support snapshot exports and collections-as-files. Pinecone added an export API. pgvector's data is in Postgres — export is a pg_dump. Know the answer before you're in a situation where you need it.
These questions won't make the decision for you, but they will surface the answers that vendors are otherwise unlikely to volunteer. The store that handles all five well is the one that's been in production long enough to have broken each of them.
Which vector store should I choose if I'm just starting?
If you already run Postgres, start with pgvector — the cheapest path. If you don't, start with Qdrant for self-hosted or Pinecone for managed.
How do these compare on hybrid (BM25 + vector) retrieval?
Weaviate has the most mature hybrid implementation. Qdrant and Pinecone both ship hybrid in 2025+. pgvector's hybrid story is BYO — you compose with Postgres full-text search.
What's the failure mode at scale?
Recall degradation under HNSW with high cardinality is the most common. Most stores expose ef_search tuning; the right value depends on your dataset and your latency budget.
Want to skip the work?
Diagest absorbs the parse / clean / dedup / chunk / embed work and hands your AI exactly what it needs.
Contact us now →