Feed-Forward Network

The feed-forward network, abbreviated FFN, is the second sublayer in each Transformer block (the first being multi-head attention), responsible for applying nonlinear transformations to each position independently. The classical FFN is a two-layer network with an expansion ratio (intermediate dimension typically 4x the hidden dimension) and a ReLU or GELU activation between the layers. Modern LLMs typically replace the classical FFN with SwiGLU, a gated linear variant that produces better quality at the same parameter count. The FFN constitutes the majority of a Transformer's parameters — roughly two-thirds of the total — making it the dominant compute target for techniques like quantization, pruning, and mixture of experts. The MoE family replaces the dense FFN with a routing layer that selects a small subset of "expert" sub-FFNs per token, dramatically reducing active parameters during inference while maintaining total capacity. AI governance teams encounter FFN architecture choices in model lineage documentation; the specific FFN variant (dense vs MoE, ReLU vs GELU vs SwiGLU) significantly affects compute, memory, and behavior.

FFN-based models in Centralpoint: Centralpoint operates above whatever FFN variant powers your models — dense, MoE, SwiGLU — in a model-agnostic platform. Tokens are metered per skill and audience, prompts stay local, and chatbots deploy through one line of JavaScript with audit-ready governance.

Related Keywords:
Feed-Forward Network,,

Back