NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Muvera: Making multi-vector retrieval as fast as single-vector search (research.google)
trengrj 2 hours ago [-]
We added Muvera to Weaviate recently https://weaviate.io/blog/muvera and also have a nice podcast on it https://www.youtube.com/watch?v=nSW5g1H4zoU.

When looking at multi-vector / ColBERT style approaches, the embedding per token approach can massively increase costs. You might go from a single 768 dimension vector to 128 x 130 = 16,640 dimensions. Even with better results from a multi-vector model this can make it unfeasible for many use-cases.

Muvera, converts the multiple vectors into a single fixed dimension (usually net smaller) vector that can be used by any ANN index. As you now have a single vector you can use all your existing ANN algorithms and stack other quantization techniques for memory savings. In my opinion it is a much better approach than PLAID because it doesn't require specific index structures or clustering assumptions and can achieve lower latency.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 16:03:58 GMT+0000 (Coordinated Universal Time) with Vercel.