31 articles

📰 Reddit r/LocalLLaMA

Articles from Reddit r/LocalLLaMA · 31 articles · Updated every 3 hours · View all reads

All ⚡ AI Lessons (11259) ArXiv cs.AIDev.to · FORUM WEBDev.to AIForbes InnovationOpenAI NewsHugging Face Blog
Reddit r/LocalLLaMA 12h ago
Are i-Quants overrated?
We all know modern "intelligent" Quantization that uses an imatrix to make a Q4_K_XL model to feel like Q6_K. But here is what i notice: While this works well o
Reddit r/LocalLLaMA 12h ago
DDTree - Another layer of speed up on top of Dflash.
This is getting ridiculous. DDTreee paper: https://liranringel.github.io/ddtree/DDTree.pdf submitted by /u/Thrumpwart [link] <a href="https://www.reddit.com/r/L
We benchmarked TranslateGemma-12b against 5 frontier LLMs on subtitle translation - it won across the board, with one significant catch
Reddit r/LocalLLaMA 🧠 Large Language Models ⚡ AI Lesson 12h ago
We benchmarked TranslateGemma-12b against 5 frontier LLMs on subtitle translation - it won across the board, with one significant catch
<img src="https://preview.redd.it/0f18kzv1p4vg1.jpg?width=140&height=140&crop=1:1,smart&auto=webp&s=f3ee3470bf8086c547cc1328df8e26e464faa991" al
Updated Qwen3.5-9B Quantization Comparison
Reddit r/LocalLLaMA 12h ago
Updated Qwen3.5-9B Quantization Comparison
This is a KLD eval across community GGUF quants
Reddit r/LocalLLaMA 13h ago
Share your speculative settings for llama.cpp and Gemma4
I have totally missed the boat on speculative decoding. Today when generating some code again for the frontend i found myself staring down at some quite monoton
How to Distill from 100B+ to <4B Models
Reddit r/LocalLLaMA 13h ago
How to Distill from 100B+ to <4B Models
submitted by /u/cmpatino_ <a href="https://i.redd.it/ytl938
MiniMax m2.7 under 64gb for Macs - 91% MMLU
Reddit r/LocalLLaMA 15h ago
MiniMax m2.7 under 64gb for Macs - 91% MMLU
http
I laughed so hard at these posts side by side (sorry for the low effort post)
Reddit r/LocalLLaMA 16h ago
I laughed so hard at these posts side by side (sorry for the low effort post)
submitted by <a href="https://www.reddit.
Reddit r/LocalLLaMA 17h ago
oMLX just implemented DFlash
https://github.com/jundot/omlx/commit/28fab9fc28f0c0013ffb307f3b21d30658ae1a72 submitted by /u/butterfly_labs [link] <a href="https://www.reddit.com/r/LocalLLaM
If it works - don’t touch it: COMPETITION
Reddit r/LocalLLaMA 18h ago
If it works - don’t touch it: COMPETITION
<a href="https://preview.redd.it/9tkcis0y93vg1.png?width=2048&format=png&auto=webp&
Reddit r/LocalLLaMA 19h ago
Update: I fine-tuned Qwen3.5-0.8B for OCR and it outperforms my previous 2B release [GGUF]
Hey everyone, A while ago I shared my fine-tuned Qwen3.5-2B OCR model. Since then I kept working on the pipeline and just released a new version based on Qwen3.
NEO-unify — A 2B multimodal model with no Vision Encoder, no VAE. Open source coming "hopefully not too long"
Reddit r/LocalLLaMA 22h ago
NEO-unify — A 2B multimodal model with no Vision Encoder, no VAE. Open source coming "hopefully not too long"
<img src="https://preview.redd.it/23t1c330b2vg1.jpg?width=140&height=122&auto=webp&s=a3f3a14cc2a5cfcf11a5db37cb7094b9aade7d37" alt="NEO-unify — A 2B
Reddit r/LocalLLaMA 23h ago
Please stop using AI for posts and showcasing your completely vibe coded projects
I get AI assisted coding, and yes I have AI ASSIST me. It gets to a point though, because I can't come on here without seeing a fully AI coded project, on that
common/gemma4 : handle parsing edge cases by aldehir · Pull Request #21760 · ggml-org/llama.cpp
Reddit r/LocalLLaMA 1d ago
common/gemma4 : handle parsing edge cases by aldehir · Pull Request #21760 · ggml-org/llama.cpp
<img src="https://external-preview.redd.it/AFKOsyM14-BoNyle1h8fdZKUcRhqF7sBtlflzzvk3pE.png?width=640&crop=smart&auto=webp&s=f6098a5ccced3d7c78c6e6c5
Reddit r/LocalLLaMA 1d ago
PSA: Having issues with Qwen3.5 overthinking? Give it a tool, and it can help dramatically.
I'm sure everyone has seen the posts from people talking about Qwen 3.5 over-thinking, or maybe you've experienced it yourself. Considering we're like 2 months
Reddit r/LocalLLaMA 1d ago
Trained a 125M LM from scratch instead of fine-tuning GPT-2 — releasing weights + SFT framework for others to build on
Trained a 125M LM from scratch (custom tokenizer) + released instruct checkpoint and SFT framework so others can fine-tune their own variants I’ve been experime
Reddit r/LocalLLaMA 1d ago
Best Local LLMs - Apr 2026
We're back with another Best Local LLMs Megathread! We have continued feasting in the months since the previous thread with the much anticipated release of the
Ram-air setup and window vent for 1100w capable AI box
Reddit r/LocalLLaMA 1d ago
Ram-air setup and window vent for 1100w capable AI box
So i have a very powerful setup here and i got tired
Follow up post, decided to build the 2x RTX PRO 6000 tower.
Reddit r/LocalLLaMA 1d ago
Follow up post, decided to build the 2x RTX PRO 6000 tower.
Decided to put the effort in and merge my two
Reddit r/LocalLLaMA 1d ago
Is there anything better than Qwen3.5-27B-UD-Q5_K_XL for coding?
I have a 5090, so my VRAM is limited to 32GB, but i find that Qwen3.5-27B-UD-Q5_K_XL with opencode (and mmproj) does a pretty good job for my use case (mainly w