Vector database showdown 2026: Pinecone vs Weaviate vs pgvector vs Qdrant

Most vector database comparisons turn into feature checklists. That is not how the decision actually gets made in production. The real questions are how hybrid search behaves under load, how filtered queries scale when every tenant has its own subset, how much DevOps the team has to spare, and what the bill looks like at 10x today's volume. This is the 2026 head-to-head for Pinecone, Weaviate, pgvector, and Qdrant — with real performance numbers, the pricing patterns that bite, and a straight answer on when pgvector is simply good enough.

The field, in one table

Each of the four represents a different bet on who runs the infrastructure, how hybrid search should work, and where filtering should happen.

Database	Shape	Best for	Watch out for
Pinecone	Fully managed SaaS	Teams that want zero ops, serverless scaling	Filtered-query latency, per-QPS cost at scale
Weaviate	Managed or self-hosted	Built-in hybrid search, modular architecture	More concepts to learn, ops overhead if self-hosted
pgvector	Postgres extension	Already running Postgres, under 10M vectors	Performance degrades past tens of millions
Qdrant	Managed or self-hosted, Rust-native	Filtered search performance, cost at scale	Smaller ecosystem than Pinecone

The best vector database is the one the team can operate confidently at the scale the product will actually reach. Benchmarks on a laptop with ten million random vectors tell you almost nothing about how any of these behave at 99% recall under filtered multi-tenant load.

Latency and recall under load

Raw query latency is where the four separate. Published benchmarks in 2026 put all four in the same order of magnitude at modest scale, but divergence appears under filtering, at higher recall thresholds, and on multi-tenant workloads.

Database	p50 latency (1M vectors, 99% recall)	Filtered-query cost	Hybrid search
Qdrant	~4–5 ms	Low — filtering is first-class	Native (vector + payload + BM25)
pgvector (HNSW)	~5–8 ms	Low when filter matches an index	Native (via tsvector + vector)
Weaviate	~20–40 ms	Moderate	Native, well-tuned
Pinecone (serverless)	~30–80 ms	Higher — filter evaluation adds latency	Supported via sparse-dense vectors

Two things worth flagging before anyone quotes these numbers in a meeting: Pinecone's pod-based tier is meaningfully faster than serverless but significantly more expensive, and every one of these numbers shifts once end-to-end embedding generation gets added. Total search latency including embedding time is typically 200–400 ms regardless of which database sits underneath.

Pricing at scale

Sticker pricing is easy. Real-world economics at 10M vectors, at 100 QPS, with filtering and metadata, look different. A few patterns hold up across client projects:

Managed services cost 1.5–3x self-hosted at 10M vectors once sustained QPS is in the hundreds.
Pinecone serverless is competitive below 10 QPS and scales expensively above — it is the right pick when the workload is bursty and idle most of the day.
Qdrant and Weaviate self-hosted on existing Kubernetes are the cheapest options at scale if the team has the DevOps capacity.
pgvector on an existing Postgres cluster is effectively free until index memory becomes the bottleneck.

If the team is already running Postgres and the corpus is under 10M vectors, pgvector is almost certainly good enough. HNSW indexing, filtered queries that hit existing Postgres indexes, and no new infrastructure to operate — the dedicated vector database can wait until measured performance says otherwise.

Hybrid search and filtering

Hybrid search — combining dense vector similarity with keyword BM25 and sometimes rerankers — has become the default for RAG systems that need to handle both fuzzy semantic matches and precise exact-keyword queries. The four databases handle it differently.

Weaviate's hybrid search is the most polished out of the box — one query blends vector and BM25 with a tunable alpha.
Qdrant pairs vector search with structured payload filters at very low cost and integrates well with external BM25 stages.
pgvector gets hybrid for free via tsvector full-text search combined with vector ordering in the same query.
Pinecone supports hybrid through sparse-dense vectors, but the plumbing requires a separate sparse-encoder pipeline.

Filtering matters more than raw speed

A multi-tenant SaaS app almost always runs filtered queries — customer ID, document type, date range — and filter performance dominates real-world latency. Qdrant and pgvector (with proper indexing) handle filters cleanly because filters evaluate against indexed structures. Pinecone's pre-filter adds noticeable latency at scale because filtering happens inside the vector index rather than before it.

Operational complexity

The hidden cost of a vector database is the cost of operating it. Index rebuilds, version upgrades, snapshots, monitoring, and capacity planning all take engineering time. The right pick depends on how much of that the team wants to own.

Database	Ops burden	Multi-tenant isolation	Backups and recovery
Pinecone	Near zero	Namespaces per tenant	Managed, limited control
Weaviate Cloud	Low	Multi-tenancy is first-class	Managed
pgvector	Already handled by Postgres ops	Row-level security or per-tenant tables	Whatever Postgres backup already does
Qdrant Cloud	Low	Collections or payload filters	Managed, snapshot API available

pgvector at scale

pgvector has crossed the line from 'fine for prototypes' to 'genuinely fine for a lot of production workloads' over the past two years. HNSW indexes build in reasonable time up to tens of millions of vectors; queries are milliseconds once the index fits in shared buffers. The limit is memory, not correctness — HNSW indexes consume 2–5x the memory of IVFFlat, and when the index spills to disk, query latency climbs sharply.

Under 10M vectors — pgvector is almost always the right default, especially on teams already running Postgres.
10M–50M vectors — pgvector still works on beefy instances, but the ops cost of tuning Postgres for vector workloads starts to approach the cost of a dedicated database.
50M+ vectors — reach for Qdrant, Weaviate, or Pinecone. pgvector can get there but the margin for error shrinks.

pgvector's HNSW is memory-hungry. Budget at least 1.5x the raw index size in shared memory, and plan for a rebuild window when dimensions or distance metrics change. A forgotten REINDEX on a large table can stall a production database for longer than anyone expects.

How we pick

When our team walks into a new project and the question is which vector database to use, the decision usually falls out of four questions.

Is the team already running Postgres and the corpus under 10M vectors? Use pgvector. Ship it.
Is ops capacity the scarce resource and the workload bursty? Use Pinecone serverless. Pay the premium for zero ops and move on.
Does the workload need first-class hybrid search with a simple API? Use Weaviate. The hybrid implementation is the most polished of the four.
Is filter-heavy multi-tenant search at scale the core workload? Use Qdrant. Filtering is its strongest suit and self-hosted economics are the cheapest at volume.

Key takeaways

All four databases are fast enough for most RAG workloads. The decision is driven by ops capacity, filtering patterns, and scale, not raw latency.
pgvector is the right default under 10M vectors — simpler, cheaper, and already handled by existing Postgres operations.
Pinecone serverless is the right default when the team cannot afford to operate anything and the workload is bursty.
Qdrant wins on filtered search and self-hosted cost at scale. Weaviate wins on out-of-the-box hybrid search.
Benchmarks on synthetic data mislead. Measure on real traffic with real filters before committing.

#vector-database#pinecone#pgvector#qdrant#weaviate#rag#ai-infrastructure

Vector database showdown 2026: Pinecone vs Weaviate vs pgvector vs Qdrant

The field, in one table

Latency and recall under load

Pricing at scale

Hybrid search and filtering

Filtering matters more than raw speed

Operational complexity

pgvector at scale

How we pick

Key takeaways

Related posts

Building production RAG systems in 2026: vector DBs, hybrid search, and eval frameworks

AI observability: logging, tracing, and evals for LLM apps

Fine-tuning vs RAG vs prompting: a decision framework for 2026

Let's build it together.