LoRA

LoRA, short for Low-Rank Adaptation, is a parameter-efficient fine-tuning technique introduced by Microsoft Research in a 2021 paper by Hu et al. that has become the dominant approach to adapting large language models for specific tasks. Rather than updating all of a model's billions of weights, LoRA freezes the base model and trains small low-rank decomposition matrices (typically containing 0.1%-1% of the parameters) that are inserted into attention layers. The result is a small adapter file (often just a few megabytes) that can be applied on top of the base model at inference time, dramatically reducing storage, training cost, and serving complexity. LoRA enables hundreds of task-specific adapters to share one base model in memory, making multi-tenant LLM serving economically viable. AI governance teams favor LoRA for domain adaptation because the small adapters are easy to audit, version, and revert compared to full fine-tuning. The technique is supported by every major training framework including Hugging Face PEFT, Unsloth, Axolotl, and the commercial fine-tuning APIs of OpenAI, Anthropic, and Google.

LoRA adapters governed by Centralpoint: Centralpoint stays model-agnostic across LoRA-adapted models from any provider — Llama, Mistral, Qwen, even self-hosted Claude and Gemini variants — and meters tokens per adapter so finance sees per-skill cost. Prompts and skills stay on-premise, and adapter-aware chatbots embed across portals with one line of JavaScript.

Related Keywords:
LoRA,,

Back