Skip to content

Writing

pgvector vs Qdrant- Results from the 1M OpenAI Benchmark

You may have considered using PostgreSQL's pgvector extension for vector similarity search. There are good reasons why this option is strictly inferior to dedicated vector search engines, such as Qdrant.

We ran both benchmarks using the ann-benchmarks solely dedicated to processing vector data. The difference in performance is quite staggering.

Query Speed

Final results show that pgvector lags behind Qdrant by a factor of 15 when it comes to throughput.

That is a 1500% deficit in speed. However, we shouldn't only consider speed as the main metric when evaluating a database. In terms of accuracy, pgvector delivers way fewer relevant results than Qdrant.

Workload

Interestingly, these disparities start to surface with as few as 100,000 chunked documents.

As an ardent supporter of PostgreSQL, it is disheartening to witness that pgvector doesn't just commence at under half the QPS at 100,000 vectors, when compared to Qdrant - it plunges precipitously beyond that.

Correctness

One might try to rationalize this by assuming that Postgres is slower, but more accurate? Data reveals that pgvector is not just slower, but also ~18% less accurate!

We measure this using the same methodology as the ann-benchmarks codebase: k-NN bruteforce as ground truth.

Latency

Here, Qdrant holds its own. The worst p95 latency for Qdrant is 2.85s, a stark contrast to pgvector, whose best p95 latency is a full 4.02s. Even more astonishing, pgvector's worst p95 latency skyrockets to an unbelievable 45.46s.

Benchmark Specs

The machine we used to run the benchmark: t3.2xlarge, 8 vCPU, 32GB RAM

For data enthusiasts among us, this Google Sheet details all the numbers for a more in-depth analysis: Google Sheet

Configuration

We use the default configuration for Qdrant and much better parameters for pgvector:

Qdrant(quantization=False, m=16, ef_construct=128, grpc=True, hnsw_ef=None, rescore=True)
PGVector(lists=200, probes=2)

The pgvector recommendation which'd be possibly worse performance-wise:

PGVector(lists=1000, probes=1)

There is much more to be tested. We will continue to explore the configuration space for both platforms and update this.

Conversations with the Community

Paul Copplestone (CEO, Supabase) has also shared his thoughts on the matter:

Yup: 1. Wait 6 months, a lot of development is happening on pgvector 2. Use hybrid search 3. Use filters on other indexed columns 4. Use partitions

And as always, take benchmarks with a grain of salt, they are never as clear-cut as they seem. We’ll publish benchmarks soon too using the latest version of pgvector

Adding my notes here:

pgvector uses full-scan when there are filters or hybrid search. This is a very slow algorithm when using 1536 embeddings. It's O(n) where n -> number of vectors matching the filter.

When there are no filters, pgvector uses IVF. This is a slower algorithm when using 1536 embeddings, and it’s less accurate than Qdrant's HNSW.

Aside: Feel free to check out my Twitter Intro to IVFPQ.

@jobergum, creator of Vespa.ai (a vector search engine) also shared his thoughts:

pgvector is an extension which default will just search the closest cluster to the query vector which for most high dimensional embedding models will return just 2-3 out of 10 real neighbors.

This is a very important point. pgvector is not a vector search engine. It's a vector extension for PostgreSQL, and that involves some tradeoffs which are sometimes not obvious.

There is a US$2000 bounty for anyone who can raise a PR to make the pgvector extension use HNSW instead of IVF.

Acknowledgements

The engineering and dataset were both done by Kumar Shivendu. Most of my contribution was in the form of spotting the bottlenecks, feedback and sponsorship.

These surprising revelations are courtesy of Erik Bernhardsson's ann-benchmarks code.

Airbnb's Metric Store: Minerva

Data lineage is a problem because most companies have several tables and queries before humans consume it!

This has well known challenges: changes do not propagate downstream from the source, and reliable (fresh, updated or complete) data is not always available.

What does Minerva do?

I was expecting Minerva to a database (collection of tables), but it turns out that Minerva is what I'll call: Data Transformation Manager.

It overlaps quite a bit with dbt but it's not a pure execution layer. It also stores metadata, orchestrates the DAG itself, and provides a way to query the data (Hive/Druid here)

Minerva solves for one major problem in the analytics space: Time to insights — as [[Julie Zhuo]] has mentioned several times at Sundial

Minerva 1.0

This is a bit preview about the past and what problems did they first solve, what was left undone and some tooling/technology choices.

Pre-computation engine

Quite similar to how we were building Sundial till very recently.

  1. De-normed measures and dimensions
  2. Segments and behaviours were both separated out

De-norming measures can be quite expensive but useful. We converged to this design across multiple clients while working with Event-level data. We also see some of our clients maintaining a similar table: "User Segments Table".

Tradeoffs in the Precomputing Approach

  1. Cubing — SQL Minerva already knows what SQL query to run, across what segments and what durations upfront. This means it can leverage CUBE operations.

Some people believe that OLAP CUBE has fallen out of use, but that's clearly not true here. As companies get larger, "old" pressures on compute and storage should re-appear and so should already known solutions like cubing.

  1. Fast Query Time: Since the results are precomputed - fast at query time

  2. Exponential in query cost. Backfill — damn expensive and wastes time and money

Everything has to be asked ahead of time, so you end up calculating too many things

Minerva 2.0

This is what a truly "modern" data transformation manager should look like in my opinion.

Here are some of the design choices:

  1. On the fly joins
  2. On the fly aggregations
  3. Optional denorm and cubing
  4. Enable precompute to be turned on

The way I see it, this is striking a balance between flexibility (the ability to do on-the-fly joins and aggregations) and cost (the ability to precompute with denorm and cubing).

Engineering Choices

Moved from Druid to StarRocks

Why StarRocks?

Minerva is SQL generation tool, not a Druid ingestion tool

Minerva has a SQL Interface now, early was JSON

SQLGlot — Python SQL Parser and Transpiler -- this is very similar to dbt for how it generates SQL using a parser and transpiler. SQLGlot is open source btw: https://github.com/tobymao/sqlglot

Near Real Time Metrics

Summary of changes made for 2.0 release

Major changes is that SQL is now a first class citizen

This is quite important. We should resist the temptation of inventing a Python transformation layer/logic. While some Python is inevitable for doing more interesting things like forecasting, using Python for calculating ratios is a bit overkill. We should instead try and consider pushing the limits of SQL for the same.

SQL is not only more widely spoken, it'd be a lot more efficient and more scalable. The downside? It's less general purpose language, and we'd have to write some tooling to make SQL work like Python.


These are some notes and screen grabs from a talk which I'd found on Youtube. Thanks to Kunal Kundu for finding the talk link which I'd lost!

Beyond First 90 Days

This one's gonna be brief and echoes 2 Less Obvious Ideas to the younger me.

I am assuming that you already know the hygiene factors: Make few promises. Keep most of them and exceed few of them atleast. Get to like the top 5% in the skill of effort estimation for your own work at the very least. And so on.

Contribute to Developer Ecosystem

Improving any part of the developer ecosystem is useful and visible at the same time. For instance, let's say you add tests for a code path on which 10 developers are working. You've made the lives of 10 developers easier. They'll remember this when you come to them for help.

For some projects/teams, even the build time is quite large and error prone. Any improvements there also save a lot of contributor or developer time.

As Joel Spolsky (the person behind Stack Overflow) wrote: There is more than 1 way to help:

  • Maintaining an issue tracker
  • Write a decent functional specification

You get the gist. Get creative and figure out points of leverage: low effort, high return on your time.

Engineering Brand Efforts

You already know what are the 1-2 things your team's best is e.g. speed, scale, cadence, or software quality.

Take those 2 topics and write down 5 reasons or points of evidence on why you think those are the 2 topics on which your team is best. For instance, if I was writing "speed" - one of my points would look like, we make 20 releases a week to almost 500K users. Or, we have fewer than 20 bugs for a release thanks to our amazing testing and QA friends.

Now - expand these 5 points into a short, bullet point essay like this one. Ask your manager and other senior engineers for advice. You say something like, "Hey, I wrote down what our team does best - do you think I captured the essence and reasoning?"

Done?

Great, now go write this as an internal and external blog. Submit this at a technical conference which cares about the dimension on which your team is the best. Bringing accolades to the team, with their blessings is much higher returns than reading 10 Medium blogs.