LSH

LSH, short for Locality Sensitive Hashing, is one of the oldest families of ANN algorithms, dating back to a seminal 1998 paper by Indyk and Motwani that defined the formal framework. LSH uses hash functions designed so that similar vectors are mapped to the same bucket with high probability, while dissimilar vectors are mapped to different buckets — allowing near-constant-time approximate similarity lookup. Common LSH variants include MinHash for Jaccard similarity (set similarity), SimHash for cosine similarity (text similarity), and random projection LSH for Euclidean distance. LSH was the dominant ANN algorithm in the early 2000s and underpins many large-scale deduplication pipelines, plagiarism detection systems, and clustering workflows. In modern vector database deployments LSH has been largely supplanted by graph-based methods like HNSW and quantization-based methods like IVF-PQ, which achieve better recall-vs-speed trade-offs. LSH retains specific niches in streaming and online settings where index updates must be cheap, and in deduplication where exact-similarity thresholds matter more than top-k ranking.

LSH alongside modern indexes in Centralpoint: Centralpoint supports legacy LSH-based deduplication pipelines alongside modern HNSW-based vector search, routing each workload to the right backend under one model-agnostic governance layer. Tokens are metered centrally, prompts stay local, and chatbots powered by either approach deploy through one line of JavaScript.

Related Keywords:
LSH,,

Back