Local ai models

Understand the model names

  • 7b: number of parameters in billions.
    • More parameters means the model can understand and generate more complex language, can handle larger context windows.
  • instruct: model specifically trained to follow instructions.
  • fp16: 16 bit floating points, format uses less memory for faster computation.
  • q4: quantization, optimized to use less memory and run faster by simplifying data representation.
    • The number 4 represents the different levels of quantization, the higher the number, the more aggressive the quantization, i.e. more accurate but less responsive (slower response time).
  • K, L, M: specific method used for the quantization.

References