ollama
Understand the model names
7b
: number of parameters in billions.- More parameters means the model can understand and generate more complex language, can handle larger context windows.
instruct
: model specifically trained to follow instructions.fp16
: 16 bit floating points, format uses less memory for faster computation.q4
: quantization, optimized to use less memory and run faster by simplifying data representation.- The number
4
represents the different levels of quantization, the higher the number, the more aggressive the quantization, i.e. more accurate but less responsive (slower response time).
- The number
K, L, M
: specific method used for the quantization.
Small models
Best Small Language Models for Accuracy and Enterprise Use Cases
codellama ❌
instruct
: Fine-tuned to generate helpful and safe answers in natural language
- 7b-instruct-q4_0: a bit slow
- 7b-instruct-q2_K: a bit faster as q4_0
code
: Base model for code completion
- 7b-code-q2_K
Fail
- Slow and unusable for computer without a good GPU (like mine).
deepseek-r1 🤔
🏷️ 1.5b
Success
- Fast.
- Had the “thinking” feature.
Fail
- Lots of hallucinations.
gemma3
General purpose
🏷️ 1b
Success
- Fast.
- Relatively good suggestions.
phi3.5
🏷️ 3.8b-mini-instruct-q4_0
Success
- Relatively good and exhaustive output.
mistral ❌
🏷️ 7b-instruct-v0.3-q4_1
Fail
- Slow and unusable for computer without a good GPU (like mine).
qwen2.5
General purpose.
🏷️ 1.5b
Success
- Fast.
- Gives good suggestions.
Fail
- Does not fully get the instructions, e.g. still writing the why, despite telling it not to.
qwen2.5-coder
🏷️ 0.5b ❌
Success
- Really fast.
Fail
- Does not produce anything good / accurate. Most of time, just outputting the same content as the input.
🏷️ 1.5b
Success
- Fast.
- Gives good suggestions.
🏷️ 3b