Sparsetral [MODEL RELEASE]

Introducing Sparsetral, a sparse MoE model made from the dense model mistral. For more information on the theory, here is the original paper (Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks). Here is the original repo that goes with the paper (original repo) and the here is the forked repo with sparsetral (mistral) integration 👉 Get it here: forked repo @SERP AI We also forked unsloth and vLLM for efficient training and inferencin

1 / 1