Constitutional AI

Constitutional AI, abbreviated CAI, is an alignment approach introduced by Anthropic in a 2022 paper that uses AI-generated critiques and revisions guided by a written set of principles (the "constitution") to reduce reliance on human feedback for harmlessness training. The technique has two phases: supervised constitutional training where the model critiques and revises its own outputs against the constitution, and RL from AI Feedback (RLAIF) where the model is fine-tuned against AI-generated preference labels. Anthropic's Claude is trained with Constitutional AI, with the constitution publicly documented in Anthropic's research papers and including principles like preferring responses that are "more harmless" and "more thoughtful". The technique reduces the human labeling burden compared to pure RLHF while maintaining or exceeding alignment quality on benchmarks. Constitutional AI has influenced subsequent work including various open-source alignment recipes, the Constitutional AI Foundation Model toolkit, and academic research on principle-based alignment. AI governance teams document constitution contents and training procedures as part of model lineage. The approach is one example of the broader trend toward scalable oversight techniques.

Constitutional-AI-aligned models with Centralpoint: Centralpoint routes generation to constitutional-AI-aligned models including Claude alongside RLHF-aligned models in a model-agnostic stack. Tokens are metered per skill, prompts stay local, and aligned chatbots deploy through one line of JavaScript on any portal.

Related Keywords:
Constitutional AI,,

Back