Llama 2 vs. Llama 3 vs. Mistral 7B, quantized with GPTQ and Bitsandbytes
Generated with DALL-E
With quantization, we can reduce the size of large language models (LLMs). Quantized LLMs are easier to run on GPUs with smaller memory, effectively serving as a compression method for LLMs.