Skip to content

feat: rework in-memory vector store to use Approx. Nearest Neighbour (ANN) #760

@joshua-mo-143

Description

@joshua-mo-143
  • I have looked for existing issues (including closed) about this

Feature Request

Currently, the in-memory vector store is a bit weak. We could really do with beefing it up. At the moment, the search is basically just brute-force cosine similarity searching for each and every single vector... which is really slow. By using methods that can calculate the ANN (Approximate Nearest Neighbour), we can add significant speedups to our implementation.

However to avoid adding in many more dependencies and ensure it can compile to WASM, we will likely need to write the algorithm from scratch. This is a non-trivial endeavor and as such we will need to ensure we have a way of benchmarking the old implementation vs the new implementation.

To start with, we should probably try implementing Locality Sensitive Hashing (LSH) first as it's the most simple to use.

I'll probably end up taking this ticket shortly if nobody else claims it because it's a non-trivial ticket that I would rather see through to completion myself.

Motivation

Performance speedup/optimisation

Proposal

Implement LSH and use it in the vector store for cosine similarity search.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions