Knowledge Distillation

Knowledge Distillation trains a smaller "student" model to mimic the behavior of a larger "teacher" model — transferring capability while dramatically reducing inference cost. The technique was introduced by Hinton, Vinyals, and Dean in 2015 and has become a standard tool for producing efficient deployable models. The student is trained not just on the original labels but on the teacher's soft probability outputs, which carry richer information than discrete labels. Famous examples include DistilBERT (40% smaller than BERT, 97% of performance), TinyLlama, Phi-3-mini (which leveraged distillation from GPT-4-class teachers), and the various distilled Whisper variants. Distillation is also fundamental to many proprietary inference-cost reduction strategies at major AI labs. The technique works for any model class — LLMs, vision models, speech models. AI governance, AI compliance, and AI risk management programs document distillation lineage in model cards supporting responsible AI in cost-optimized enterprise AI deployments at scale.

Centralpoint Supports Distilled Models Without Skipping Governance: Oxcyon's Centralpoint AI Governance Platform tracks every model interaction — distilled or full-size — across OpenAI, Gemini, Llama, and embedded. Centralpoint meters consumption, keeps prompts and skills on-prem, and embeds chatbots into your portals via a single JavaScript line.

Related Keywords:
Knowledge Distillation,,

Back