Product Introduction
All MPT-30B models have special features that differentiate them from other LLMs. These features include an 8k token context window during training, support for longer contexts through ALiBi, and efficient inference + training performance achieved through FlashAttention. Due to its pretraining data mixture, the MPT-30B series also possesses powerful encoding capabilities. The model has been extended to an 8k context window on the NVIDIA H100 GPU, making it (to our knowledge) the first legal master trained on the H100 GPU and now available for use by MosaicML customers. The size of MPT-30B has also been specifically chosen for easy deployment on a single GPU - 1x NVIDIA A100-80GB (16-bit precision) or 1x NVIDIA A100-40GB (8-bit precision). Other similar LLMs, such as Falcon-40B, have a larger number of parameters and cannot be served on a single data center GPU (currently); this requires more than 2 GPUs, thus increasing the minimum inference system cost. If you wish to start using MPT-30B in production, you can customize and deploy it using the MosaicML platform in various ways.