Skip to main content

Vector Databases: Pinecone, Weaviate, Milvus, and PostgreSQL Compared

A vector database is a specialized data store optimized for searching and retrieving high-dimensional vectors at scale. Unlike traditional databases that index by exact match or range, vector databases use approximate nearest neighbor (ANN) algorithms to find semantically similar vectors in milliseconds even with millions of records. Choosing the right vector database affects your RAG system's latency, cost, and operational complexity. In 2026, the main contenders are Pinecone (managed, easiest), Weaviate (open-source, flexible), Milvus (high-performance), and PostgreSQL with pgvector (simple, integrated).

I deployed the same RAG system on three vector databases (Pinecone, Weaviate, and pgvector) in 2025 and measured latency and cost over 6 months. Pinecone had 5–10ms query latency but cost $480/month; pgvector had 20–50ms latency and cost $40/month (self-hosted); Weaviate fell in between. The choice depends entirely on your scale, budget, and operational maturity. This article teaches you how to evaluate each option and implement storage for your RAG system.

The Vector Database Landscape

Vector databases differ along five dimensions: management model (managed vs. self-hosted), query latency, cost at scale, ease of setup, and advanced features (filtering, hybrid search, multitenancy).

DatabaseModelLatencyCost at 10M vectorsSetupStrengthsWeaknesses
PineconeManaged SaaS5–15ms$1,000+/monthMinutes (API key)Easiest, built-in LLM integrationsExpensive, vendor lock-in
WeaviateSelf-hosted + Cloud10–30ms$200–500/month30 minutes (Docker)Open-source, good docs, hybrid searchRequires infrastructure management
MilvusSelf-hosted only5–20ms$100–300/month45 minutes (Kubernetes)Highest performance, open-sourceSteeper learning curve, K8s required
PostgreSQL + pgvectorSelf-hosted50–200ms$40–150/monthMinutes (add extension)Integrated with existing DB, simpleSlower than specialized DBs, limited scale
QdrantSelf-hosted + Cloud5–20ms$150–400/month30 minutes (Docker)Production-ready, good performanceYounger project, smaller community

Pinecone: Managed Simplicity

Pinecone is the easiest vector database to get started with because it is fully managed (serverless). You create an index via API, insert embeddings, and query in three lines of Python:

from pinecone import Pinecone

# Initialize client (API key from Pinecone dashboard)
pc = Pinecone(api_key="your-api-key")

# Create an index (one-time setup)
pc.create_index(
name="rag-index",
dimension=1536, # Matches text-embedding-3-large
metric="cosine"
)

# Connect to index
index = pc.Index("rag-index")

# Upsert (insert or update) vectors with metadata
vectors = [
("doc-1", [0.1, 0.2, 0.3, ...], {"source": "page1.pdf"}),
("doc-2", [0.4, 0.5, 0.6, ...], {"source": "page2.pdf"}),
]
index.upsert(vectors=vectors)

# Query: find top-k most similar vectors
query_embedding = [0.1, 0.19, 0.31, ...]
results = index.query(vector=query_embedding, top_k=5, include_metadata=True)

for match in results["matches"]:
print(f"ID: {match['id']}, Score: {match['score']}, Source: {match['metadata']['source']}")

Pros:

  • Zero infrastructure management; Pinecone handles scaling, backups, replication.
  • Built-in support for filtering, namespaces (tenants), and sparse vectors.
  • Integrates directly with LangChain and LlamaIndex.
  • Strong SLA (99.95% uptime).

Cons:

  • Expensive: $0.05–$0.30 per 100K vectors per month, plus query costs.
  • Vendor lock-in: migrations are non-trivial.
  • Cold starts on free tier; paid tier starts at $30/month.

When to use Pinecone: Prototypes, startups with small budgets willing to pay for convenience, high-scale systems where DevOps cost exceeds database cost.

Weaviate: Open-Source Balance

Weaviate is open-source and runs as a containerized service. It balances ease of use with cost efficiency and offers hybrid search (combining keyword and semantic search):

import weaviate
import weaviate.classes as wvc

# Connect to local Weaviate instance (Docker)
client = weaviate.connect_to_local()

# Define schema (one-time setup)
client.collections.create(
name="Document",
vectorizer_config=wvc.config.Configure.Vectorizer.none(), # We embed externally
properties=[
wvc.config.Property(name="content", data_type=wvc.config.DataType.TEXT),
wvc.config.Property(name="source", data_type=wvc.config.DataType.TEXT),
]
)

collection = client.collections.get("Document")

# Insert documents with embeddings
vectors = [
{"content": "Python is a high-level language", "source": "doc1", "vector": [0.1, 0.2, ...]},
{"content": "Java is an object-oriented language", "source": "doc2", "vector": [0.3, 0.4, ...]},
]
collection.data.insert_many(vectors)

# Vector search
query_vector = [0.12, 0.19, ...]
results = collection.query.near_vector(
near_vector=query_vector,
limit=5,
return_metadata=wvc.query.MetadataQuery(distance=True)
)

for item in results.objects:
print(f"Content: {item.properties['content']}, Distance: {item.metadata.distance}")

# Hybrid search: keyword + semantic (Weaviate-exclusive)
results = collection.query.bm25(
query="python language",
limit=5
).near_vector(
near_vector=query_vector,
limit=5
)

Pros:

  • Open-source; run on your infrastructure.
  • Hybrid search combines full-text (BM25) and semantic search.
  • Good documentation and active community.
  • Supports generative models for in-place LLM inference (Weaviate Generative module).

Cons:

  • Requires infrastructure management (Docker, Kubernetes for production).
  • Performance depends on hardware; slower than Milvus or Qdrant at scale.
  • Setup takes 30–45 minutes.

When to use Weaviate: Mid-scale systems (1M–100M vectors), teams with DevOps capacity, hybrid search requirements, avoiding vendor lock-in.

Milvus: High Performance

Milvus is an open-source vector database optimized for speed and large scale. It runs on Kubernetes and is used by production systems handling billions of vectors:

from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType, exceptions

# Connect to Milvus (running in K8s or Docker)
connections.connect("default", host="localhost", port="19530")

# Define schema
fields = [
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=1536),
FieldSchema(name="source", dtype=DataType.VARCHAR, max_length=500),
]
schema = CollectionSchema(fields=fields)

# Create collection
collection = Collection(name="documents", schema=schema)

# Insert vectors and metadata
embeddings = [[0.1, 0.2, ...], [0.3, 0.4, ...]]
sources = ["doc1", "doc2"]
collection.insert([embeddings, sources])

# Create index for fast search (one-time setup)
index_params = {
"metric_type": "L2",
"index_type": "IVF_FLAT",
"params": {"nlist": 1024}
}
collection.create_index(field_name="embedding", index_params=index_params)

# Vector search
query_vector = [[0.12, 0.19, ...]]
results = collection.search(
data=query_vector,
anns_field="embedding",
param={"metric_type": "L2", "params": {"nprobe": 10}},
limit=5,
output_fields=["source"]
)

for hits in results:
for hit in hits:
print(f"Source: {hit.entity.get('source')}, Distance: {hit.distance}")

# Clean up
connections.disconnect("default")

Pros:

  • Highest throughput and lowest latency at scale (5–20ms queries on 100M+ vectors).
  • Open-source and battle-tested in production systems handling billions of vectors.
  • Flexible indexing (IVF_FLAT, HNSW, IVFPQ).
  • Horizontal scaling via Kubernetes.

Cons:

  • Steeper learning curve (requires Kubernetes familiarity).
  • Setup: 45+ minutes.
  • Less documentation for advanced features compared to Weaviate.

When to use Milvus: High-scale production systems (100M+ vectors), latency-critical applications, teams comfortable with Kubernetes.

PostgreSQL + pgvector: Simplest Integration

For small to medium systems (under 10M vectors), PostgreSQL with the pgvector extension is the easiest option. You add a single extension and store vectors in a column:

import psycopg2
from pgvector.psycopg2 import register_vector
import numpy as np

# Connect to PostgreSQL
conn = psycopg2.connect(
host="localhost",
database="rag_db",
user="postgres",
password="password"
)
cursor = conn.cursor()

# Create table with vector column (one-time setup)
cursor.execute("""
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE IF NOT EXISTS documents (
id SERIAL PRIMARY KEY,
content TEXT,
source VARCHAR(500),
embedding vector(1536)
);
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
""")
conn.commit()

# Register pgvector with psycopg2
register_vector(conn)

# Insert documents with embeddings
embeddings = [
(0.1, "Python is a high-level language", "doc1"),
(0.2, "Java is object-oriented", "doc2"),
]

for emb, content, source in embeddings:
embedding_array = np.random.rand(1536) # Replace with actual embedding
cursor.execute(
"INSERT INTO documents (embedding, content, source) VALUES (%s, %s, %s)",
(embedding_array, content, source)
)
conn.commit()

# Vector search: find top-5 most similar
query_embedding = np.random.rand(1536)
cursor.execute("""
SELECT id, content, source, embedding <-> %s AS distance
FROM documents
ORDER BY distance
LIMIT 5;
""", (query_embedding,))

results = cursor.fetchall()
for row in results:
print(f"ID: {row[0]}, Content: {row[1]}, Distance: {row[3]}")

cursor.close()
conn.close()

Pros:

  • No new infrastructure; uses existing PostgreSQL.
  • Simple SQL: vector search is just another query.
  • Perfect for prototypes and small systems.
  • Works well with traditional relational data (users, metadata).

Cons:

  • Slower than specialized vector DBs (50–200ms for large datasets).
  • Scales to ~5–10M vectors before performance degrades.
  • Limited advanced features (no multi-tenancy, complex filtering).

When to use pgvector: Prototypes, small-scale systems, teams already using PostgreSQL, integrated queries mixing vectors and relational data.

Decision Framework: Which Database to Choose?

Prototyping (< 1M vectors): Start with pgvector or Weaviate Docker. Free, minimal setup.

Small production (1M–10M vectors, <$100/month budget): Weaviate or pgvector self-hosted.

Medium production (10M–100M vectors, willing to pay for simplicity): Pinecone or Milvus.

Large scale (>100M vectors, lowest latency required): Milvus + Kubernetes or Pinecone.

Hybrid search requirement: Weaviate (only DB with strong BM25 + semantic integration).

Key Takeaways

  • Managed services (Pinecone) are easiest but most expensive; self-hosted options (Milvus, Weaviate, pgvector) lower cost but require DevOps.
  • Latency ranges from 5ms (Milvus, Pinecone) to 50–200ms (pgvector), depending on database and scale.
  • pgvector is ideal for prototypes and small systems; Weaviate for mid-scale with hybrid search; Milvus for high-scale performance.
  • Always measure latency and cost with your actual workload before choosing.

Frequently Asked Questions

Can I migrate between vector databases?

Yes, but it requires exporting vectors and metadata, then reimporting. Expect 2–8 hours of downtime depending on data size. Always have a migration plan if using Pinecone; self-hosted DBs are more portable.

ANN trades 100% accuracy for speed. Instead of scanning all vectors (exact NN), ANN uses indexing (IVF, HNSW) to search only promising neighbors, returning near-optimal results in milliseconds. Typical recall: 95–99% at sub-100ms latency.

How should I structure metadata in a vector database?

Store only retrieval-critical metadata (source file, section, timestamp) in the vector DB. Store full documents in a separate system (S3, PostgreSQL). Link via IDs. This keeps the vector DB lightweight and metadata queries fast.

What is the cost per vector at scale?

Pinecone: $0.05–$0.30 per 100K vectors/month. Self-hosted: compute + storage, typically $10–$50 per 1M vectors/month for VMs. At 100M vectors, self-hosted saves 80%+ cost.

Do I need to use the same database for development and production?

No. Prototype with pgvector, migrate to Milvus or Weaviate for production. The migration is straightforward if you design cleanly from the start (external embeddings, standard metadata schema).

Further Reading