Product Quantization

Product Quantization, abbreviated PQ, is a vector compression technique introduced by Jégou, Douze, and Schmid in 2011 that divides a high-dimensional vector into m subvectors and represents each subvector by the index of its closest centroid in a small codebook learned via k-means. A 768-dimensional float32 vector occupying 3,072 bytes can be compressed to as few as 32 bytes with PQ, an enormous reduction that makes billion-scale vector search economically feasible. The trade-off is some loss of distance accuracy because compressed vectors no longer encode the original geometry exactly, but for most RAG and recommendation workloads the recall loss is acceptable when paired with appropriate reranking. PQ is the foundation of IVF-PQ, OPQ (Optimized PQ), and many production vector indexes. AI governance teams evaluating PQ-based deployments validate recall against uncompressed baselines on representative queries before going live. The technique has been a core component of FAISS since its initial release and underpins the vector indexes of many vector databases internally.

Product Quantization with Centralpoint: Centralpoint operates above whatever quantization strategy your vector backend uses, metering retrieval-plus-generation tokens so the cost-quality trade-off is transparent. The model-agnostic platform routes generation to OpenAI, Anthropic, Gemini, or LLAMA, keeps prompts local, and embeds PQ-backed chatbots through one line of JavaScript.

Related Keywords:
Product Quantization,,

Back