MiniLM

MiniLM is Microsoft Research's family of small, fast embedding models that became foundational to the open-source embedding ecosystem. The all-MiniLM-L6-v2 variant — a 6-layer, 22M-parameter model producing 384-dimensional vectors — is one of the most-downloaded models on Hugging Face Hub, with hundreds of millions of downloads. The model produces strong-enough embeddings for many retrieval tasks while being dramatically smaller and faster than larger alternatives. MiniLM is distilled from larger BERT-based teacher models using a technique called deep self-attention distillation. The small footprint enables CPU-only inference, in-browser deployment via ONNX or TensorFlow.js, and on-device retrieval scenarios. Sentence-Transformers (the popular embedding framework) ships MiniLM models as defaults. Real-world deployments include consumer applications, edge AI, browser-based semantic search, and any scenario where embedding cost or latency dominates. AI governance, AI compliance, and AI risk management programs deploy MiniLM for lightweight retrieval supporting responsible AI through cost-efficient embedding pipelines in enterprise AI environments at scale.

Centralpoint Routes Lightweight Retrieval to MiniLM: Oxcyon's Centralpoint AI Governance Platform powers high-volume retrieval with MiniLM alongside OpenAI, Cohere, Voyage, BGE, and other embedding models. Centralpoint meters every call, keeps prompts and skills on-prem, and embeds chatbots into your portals via a single JavaScript line.


Related Keywords:
MiniLM,,