DeepSpeed

DeepSpeed is an open-source deep learning optimization library released by Microsoft Research in 2020 that provides memory-efficient training, distributed inference, and a suite of techniques for scaling LLM training. The library is best known for introducing ZeRO (Zero Redundancy Optimizer), but it also includes pipeline parallelism, expert parallelism for MoE, automatic mixed precision, and the DeepSpeed-Chat training pipeline for RLHF. DeepSpeed powered the training of Microsoft's Turing-NLG, the Megatron-Turing NLG collaboration with NVIDIA, and many other large models. The library integrates with PyTorch and Hugging Face Transformers through a simple configuration JSON, and is supported by Axolotl, Unsloth, and the major commercial training platforms. DeepSpeed-Inference adds optimized serving for trained models, though vLLM and TensorRT-LLM have largely supplanted it for production inference. AI governance teams use DeepSpeed configurations as part of their training reproducibility documentation. The library remains under active Microsoft maintenance with regular feature releases.

DeepSpeed-trained models with Centralpoint: Centralpoint operates above whatever training framework produced your models — DeepSpeed, FSDP, Megatron — with consistent metering across the LLM stack. The model-agnostic platform routes to OpenAI, Claude, Gemini, LLAMA, keeps prompts local, and deploys chatbots through one line of JavaScript.


Related Keywords:
DeepSpeed,,