Most vector database comparisons turn into feature checklists. That is not how the decision actually gets made in production. The real questions are how hybrid search behaves under load, how filtered queries scale when every tenant has its own subset, how much DevOps the team has to spare, and what the bill looks like at 10x today's volume. This is the 2026 head-to-head for Pinecone, Weaviate, pgvector, and Qdrant — with real performance numbers, the pricing patterns that bite, and a straight answer on when pgvector is simply good enough.
The field, in one table
Each of the four represents a different bet on who runs the infrastructure, how hybrid search should work, and where filtering should happen.
| Database | Shape | Best for | Watch out for |
|---|---|---|---|
| Pinecone | Fully managed SaaS | Teams that want zero ops, serverless scaling | Filtered-query latency, per-QPS cost at scale |
| Weaviate | Managed or self-hosted | Built-in hybrid search, modular architecture | More concepts to learn, ops overhead if self-hosted |
| pgvector | Postgres extension | Already running Postgres, under 10M vectors | Performance degrades past tens of millions |
| Qdrant | Managed or self-hosted, Rust-native | Filtered search performance, cost at scale | Smaller ecosystem than Pinecone |
The best vector database is the one the team can operate confidently at the scale the product will actually reach. Benchmarks on a laptop with ten million random vectors tell you almost nothing about how any of these behave at 99% recall under filtered multi-tenant load.
Latency and recall under load
Raw query latency is where the four separate. Published benchmarks in 2026 put all four in the same order of magnitude at modest scale, but divergence appears under filtering, at higher recall thresholds, and on multi-tenant workloads.
| Database | p50 latency (1M vectors, 99% recall) | Filtered-query cost | Hybrid search |
|---|---|---|---|
| Qdrant | ~4–5 ms | Low — filtering is first-class | Native (vector + payload + BM25) |
| pgvector (HNSW) | ~5–8 ms | Low when filter matches an index | Native (via tsvector + vector) |
| Weaviate | ~20–40 ms | Moderate | Native, well-tuned |
| Pinecone (serverless) | ~30–80 ms | Higher — filter evaluation adds latency | Supported via sparse-dense vectors |
Two things worth flagging before anyone quotes these numbers in a meeting: Pinecone's pod-based tier is meaningfully faster than serverless but significantly more expensive, and every one of these numbers shifts once end-to-end embedding generation gets added. Total search latency including embedding time is typically 200–400 ms regardless of which database sits underneath.
Pricing at scale
Sticker pricing is easy. Real-world economics at 10M vectors, at 100 QPS, with filtering and metadata, look different. A few patterns hold up across client projects:
- Managed services cost 1.5–3x self-hosted at 10M vectors once sustained QPS is in the hundreds.
- Pinecone serverless is competitive below 10 QPS and scales expensively above — it is the right pick when the workload is bursty and idle most of the day.
- Qdrant and Weaviate self-hosted on existing Kubernetes are the cheapest options at scale if the team has the DevOps capacity.
- pgvector on an existing Postgres cluster is effectively free until index memory becomes the bottleneck.
If the team is already running Postgres and the corpus is under 10M vectors, pgvector is almost certainly good enough. HNSW indexing, filtered queries that hit existing Postgres indexes, and no new infrastructure to operate — the dedicated vector database can wait until measured performance says otherwise.
Hybrid search and filtering
Hybrid search — combining dense vector similarity with keyword BM25 and sometimes rerankers — has become the default for RAG systems that need to handle both fuzzy semantic matches and precise exact-keyword queries. The four databases handle it differently.
- Weaviate's hybrid search is the most polished out of the box — one query blends vector and BM25 with a tunable alpha.
- Qdrant pairs vector search with structured payload filters at very low cost and integrates well with external BM25 stages.
- pgvector gets hybrid for free via tsvector full-text search combined with vector ordering in the same query.
- Pinecone supports hybrid through sparse-dense vectors, but the plumbing requires a separate sparse-encoder pipeline.
Filtering matters more than raw speed
A multi-tenant SaaS app almost always runs filtered queries — customer ID, document type, date range — and filter performance dominates real-world latency. Qdrant and pgvector (with proper indexing) handle filters cleanly because filters evaluate against indexed structures. Pinecone's pre-filter adds noticeable latency at scale because filtering happens inside the vector index rather than before it.
Operational complexity
The hidden cost of a vector database is the cost of operating it. Index rebuilds, version upgrades, snapshots, monitoring, and capacity planning all take engineering time. The right pick depends on how much of that the team wants to own.
| Database | Ops burden | Multi-tenant isolation | Backups and recovery |
|---|---|---|---|
| Pinecone | Near zero | Namespaces per tenant | Managed, limited control |
| Weaviate Cloud | Low | Multi-tenancy is first-class | Managed |
| pgvector | Already handled by Postgres ops | Row-level security or per-tenant tables | Whatever Postgres backup already does |
| Qdrant Cloud | Low | Collections or payload filters | Managed, snapshot API available |
pgvector at scale
pgvector has crossed the line from 'fine for prototypes' to 'genuinely fine for a lot of production workloads' over the past two years. HNSW indexes build in reasonable time up to tens of millions of vectors; queries are milliseconds once the index fits in shared buffers. The limit is memory, not correctness — HNSW indexes consume 2–5x the memory of IVFFlat, and when the index spills to disk, query latency climbs sharply.
- Under 10M vectors — pgvector is almost always the right default, especially on teams already running Postgres.
- 10M–50M vectors — pgvector still works on beefy instances, but the ops cost of tuning Postgres for vector workloads starts to approach the cost of a dedicated database.
- 50M+ vectors — reach for Qdrant, Weaviate, or Pinecone. pgvector can get there but the margin for error shrinks.
pgvector's HNSW is memory-hungry. Budget at least 1.5x the raw index size in shared memory, and plan for a rebuild window when dimensions or distance metrics change. A forgotten REINDEX on a large table can stall a production database for longer than anyone expects.
How we pick
When our team walks into a new project and the question is which vector database to use, the decision usually falls out of four questions.
- Is the team already running Postgres and the corpus under 10M vectors? Use pgvector. Ship it.
- Is ops capacity the scarce resource and the workload bursty? Use Pinecone serverless. Pay the premium for zero ops and move on.
- Does the workload need first-class hybrid search with a simple API? Use Weaviate. The hybrid implementation is the most polished of the four.
- Is filter-heavy multi-tenant search at scale the core workload? Use Qdrant. Filtering is its strongest suit and self-hosted economics are the cheapest at volume.
Key takeaways
- All four databases are fast enough for most RAG workloads. The decision is driven by ops capacity, filtering patterns, and scale, not raw latency.
- pgvector is the right default under 10M vectors — simpler, cheaper, and already handled by existing Postgres operations.
- Pinecone serverless is the right default when the team cannot afford to operate anything and the workload is bursty.
- Qdrant wins on filtered search and self-hosted cost at scale. Weaviate wins on out-of-the-box hybrid search.
- Benchmarks on synthetic data mislead. Measure on real traffic with real filters before committing.