Local ai models
Understand the model names
7b: number of parameters in billions.- More parameters means the model can understand and generate more complex language, can handle larger context windows.
instruct: model specifically trained to follow instructions.fp16: 16 bit floating points, format uses less memory for faster computation.q4: quantization, optimized to use less memory and run faster by simplifying data representation.- The number
4represents the different levels of quantization, the higher the number, the more aggressive the quantization, i.e. more accurate but less responsive (slower response time).
- The number
K, L, M: specific method used for the quantization.
References
- AI’s Plummeting Prices Are a Software Story, Not a Hardware One
- GitHub - cuolm/pi-sbx-llamacpp: Run Pi coding agent isolated in a Docker Sandbox microVM with a local llama-server as the inference backend
- Running local models is good now | ✰Vicki Boykis✰
- Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding? | Hacker News
- Local Qwen isn’t a worse Opus, it’s a different tool
- Humble Pi — local agentic coding on minimal hardware · GitHub
- 2026-06-12 - How to Setup a Local Coding Agent on macOS - Kyle Howells
- 2026-06-16 - Fine Tuning a Local LLM to Categorize Questions
- 2026-06-27 - Using Local Coding Agents - by Sebastian Raschka, PhD
- 2026-06-29 - Qwen 3.6 27B is the sweet spot for local development - Quesma Blog
- Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding | DeepReinforce Blog | Jun. 2026