ollama

Understand the model names

7b: number of parameters in billions.
- More parameters means the model can understand and generate more complex language, can handle larger context windows.
instruct: model specifically trained to follow instructions.
fp16: 16 bit floating points, format uses less memory for faster computation.
q4: quantization, optimized to use less memory and run faster by simplifying data representation.
- The number 4 represents the different levels of quantization, the higher the number, the more aggressive the quantization, i.e. more accurate but less responsive (slower response time).
K, L, M: specific method used for the quantization.

Small models

Best Small Language Models for Accuracy and Enterprise Use Cases

codellama ❌

instruct: Fine-tuned to generate helpful and safe answers in natural language

7b-instruct-q4_0: a bit slow
7b-instruct-q2_K: a bit faster as q4_0

code: Base model for code completion

7b-code-q2_K

Fail

Slow and unusable for computer without a good GPU (like mine).

deepseek-r1 🤔

🏷️ 1.5b

Success

Fast.

Has the “thinking” feature.

Fail

Lots of hallucinations.

gemma3

General purpose

🏷️ 1b

Success

Fast.

Relatively good suggestions.

phi3.5

🏷️ 3.8b-mini-instruct-q4_0

Success

Relatively good and exhaustive output.

mistral ❌

🏷️ 7b-instruct-v0.3-q4_1

Fail

Slow and unusable for computer without a good GPU (like mine).

qwen2.5

General purpose.

🏷️ 1.5b

Success

Fast.

Gives good suggestions.

Fail

Does not fully get the instructions, e.g. still writing the why, despite telling it not to.

qwen2.5-coder

🏷️ 0.5b ❌

Success

Really fast.

Fail

Does not produce anything good / accurate. Most of time, just outputting the same content as the input.

🏷️ 1.5b

Success

Fast.

Gives good suggestions.

🏷️ 3b

🚧 To be tested

Alternative to ollama

Practical LLM Inference in Modern Java https://github.com/mukel/llama3.java

🗒️ l-lin

Explorer

ollama

Understand the model names

Small models

codellama ❌

deepseek-r1 🤔

gemma3

phi3.5

mistral ❌

qwen2.5

qwen2.5-coder

Alternative to ollama

Explorer

Recent Notes

kubetailrb

kotlin

sops

so you want to build an event driven system

ai

Graph View

Table of Contents

Backlinks