Learning Rate

Learning rate is the hyperparameter that controls how large a step the optimizer takes in the direction of the negative gradient during training — too small and training is slow, too large and training diverges. Modern LLM training uses learning rate schedules rather than constant values: warmup from zero to a peak rate over the first few thousand steps, then decay (cosine, linear, or constant) toward zero. Typical peak learning rates for LLM pretraining are 1e-4 to 3e-4, while fine-tuning uses lower values like 1e-5 to 1e-4 to avoid disrupting pretrained representations. LoRA fine-tuning often uses much higher learning rates (1e-4 to 1e-3) because the small adapter weights need larger updates to learn meaningful task representations. Learning rate is the single most important hyperparameter to tune for training stability and quality. Tools like Optuna and Weights & Biases automate hyperparameter search. AI governance teams document the learning rate schedule alongside other training hyperparameters because reproducibility requires the full schedule, not just the peak value.

Training-aware governance in Centralpoint: Centralpoint operates above whatever training pipeline produced your models, with consistent metering and audit logging. The model-agnostic platform routes to OpenAI, Anthropic, Gemini, LLAMA, embedded models, keeps prompts local, and deploys chatbots through one line of JavaScript with audit-ready governance.

Related Keywords:
Learning Rate,,

Back