ollama

Understand the model names

  • 7b: number of parameters in billions.
    • More parameters means the model can understand and generate more complex language, can handle larger context windows.
  • instruct: model specifically trained to follow instructions.
  • fp16: 16 bit floating points, format uses less memory for faster computation.
  • q4: quantization, optimized to use less memory and run faster by simplifying data representation.
    • The number 4 represents the different levels of quantization, the higher the number, the more aggressive the quantization, i.e. more accurate but less responsive (slower response time).
  • K, L, M: specific method used for the quantization.

Small models

Best Small Language Models for Accuracy and Enterprise Use Cases

codellama

instruct: Fine-tuned to generate helpful and safe answers in natural language

  • 7b-instruct-q4_0: a bit slow
  • 7b-instruct-q2_K: a bit faster as q4_0

code: Base model for code completion

  • 7b-code-q2_K

Fail

  • Slow and unusable for computer without a good GPU (like mine).

deepseek-r1 🤔

🏷️ 1.5b

Success

  • Fast.
  • Had the “thinking” feature.

Fail

  • Lots of hallucinations.

gemma3

General purpose

🏷️ 1b

Success

  • Fast.
  • Relatively good suggestions.

phi3.5

🏷️ 3.8b-mini-instruct-q4_0

Success

  • Relatively good and exhaustive output.

mistral

🏷️ 7b-instruct-v0.3-q4_1

Fail

  • Slow and unusable for computer without a good GPU (like mine).

qwen2.5

General purpose.

🏷️ 1.5b

Success

  • Fast.
  • Gives good suggestions.

Fail

  • Does not fully get the instructions, e.g. still writing the why, despite telling it not to.

qwen2.5-coder

🏷️ 0.5b ❌

Success

  • Really fast.

Fail

  • Does not produce anything good / accurate. Most of time, just outputting the same content as the input.

🏷️ 1.5b

Success

  • Fast.
  • Gives good suggestions.

🏷️ 3b