Layer Normalization

Layer normalization, often abbreviated LayerNorm, is a normalization technique introduced by Ba, Kiros, and Hinton in 2016 that normalizes activations across the feature dimension within each token, stabilizing training of deep networks like Transformers. Unlike BatchNorm which normalizes across the batch dimension and breaks at small batch sizes, LayerNorm operates independently per token, making it ideal for variable-length sequence models. LayerNorm has two learnable parameters (scale and shift) per feature, applied after the normalization step. The technique is applied twice per Transformer block — once before self-attention and once before the feed-forward network — in modern pre-norm Transformer architectures (the post-norm variant from the original 2017 paper has fallen out of favor due to training instability at scale). RMSNorm is a simpler variant that has largely replaced LayerNorm in modern LLMs like Llama, Mistral, and Qwen because it produces equivalent quality with fewer parameters and less compute. AI governance teams document the normalization choice as part of model architecture lineage.

Normalization-aware governance in Centralpoint: Centralpoint operates above whatever normalization variant powers your models — LayerNorm, RMSNorm — in a model-agnostic platform. Tokens are metered consistently across the LLM stack, prompts stay local, and chatbots deploy through one line of JavaScript on any portal with audit-ready governance.

Related Keywords:
Layer Normalization,,

Back