📰 Reddit r/LocalLLaMA

Articles from Reddit r/LocalLLaMA · 31 articles · Updated every 3 hours · View all reads

All ⚡ AI Lessons (11259) ArXiv cs.AI Dev.to · FORUM WEB Dev.to AI Forbes Innovation OpenAI News Hugging Face Blog

Reddit r/LocalLLaMA 11h ago

Home-rolled loop agent is surprisingly effective

I created a small demo to illustrate how agents work compared to a

Reddit r/LocalLLaMA 11h ago

Gemma 4 31B — 4bit is all you need

Gemma quant comparison on M5 Max MacBook Pro 128GB ( subjective

Reddit r/LocalLLaMA 12h ago

24/7 Headless AI Server on Xiaomi 12 Pro (Snapdragon 8 Gen 1 + Ollama/Gemma4)

Turned

Reddit r/LocalLLaMA 12h ago

I-DLM: Introspective Diffusion Language Models

submitted by /u/nnxnnx [link] [comments]

Reddit r/LocalLLaMA 12h ago

Are i-Quants overrated?

We all know modern "intelligent" Quantization that uses an imatrix to make a Q4_K_XL model to feel like Q6_K. But here is what i notice: While this works well o

Reddit r/LocalLLaMA 12h ago

DDTree - Another layer of speed up on top of Dflash.

This is getting ridiculous. DDTreee paper: https://liranringel.github.io/ddtree/DDTree.pdf submitted by /u/Thrumpwart [link] <a href="https://www.reddit.com/r/L

Reddit r/LocalLLaMA 🧠 Large Language Models ⚡ AI Lesson 12h ago

We benchmarked TranslateGemma-12b against 5 frontier LLMs on subtitle translation - it won across the board, with one significant catch

<img src="https://preview.redd.it/0f18kzv1p4vg1.jpg?width=140&height=140&crop=1:1,smart&auto=webp&s=f3ee3470bf8086c547cc1328df8e26e464faa991" al

Reddit r/LocalLLaMA 12h ago

Updated Qwen3.5-9B Quantization Comparison

This is a KLD eval across community GGUF quants

Reddit r/LocalLLaMA 13h ago

Share your speculative settings for llama.cpp and Gemma4

I have totally missed the boat on speculative decoding. Today when generating some code again for the frontend i found myself staring down at some quite monoton

Reddit r/LocalLLaMA 13h ago

How to Distill from 100B+ to <4B Models

submitted by /u/cmpatino_ <a href="https://i.redd.it/ytl938

Reddit r/LocalLLaMA 15h ago

MiniMax m2.7 under 64gb for Macs - 91% MMLU

http

Reddit r/LocalLLaMA 16h ago

I laughed so hard at these posts side by side (sorry for the low effort post)

submitted by <a href="https://www.reddit.

Reddit r/LocalLLaMA 17h ago

oMLX just implemented DFlash

https://github.com/jundot/omlx/commit/28fab9fc28f0c0013ffb307f3b21d30658ae1a72 submitted by /u/butterfly_labs [link] <a href="https://www.reddit.com/r/LocalLLaM

Reddit r/LocalLLaMA 18h ago

If it works - don’t touch it: COMPETITION

<a href="https://preview.redd.it/9tkcis0y93vg1.png?width=2048&format=png&auto=webp&

Reddit r/LocalLLaMA 19h ago

Update: I fine-tuned Qwen3.5-0.8B for OCR and it outperforms my previous 2B release [GGUF]

Hey everyone, A while ago I shared my fine-tuned Qwen3.5-2B OCR model. Since then I kept working on the pipeline and just released a new version based on Qwen3.

Reddit r/LocalLLaMA 22h ago

NEO-unify — A 2B multimodal model with no Vision Encoder, no VAE. Open source coming "hopefully not too long"

<img src="https://preview.redd.it/23t1c330b2vg1.jpg?width=140&height=122&auto=webp&s=a3f3a14cc2a5cfcf11a5db37cb7094b9aade7d37" alt="NEO-unify — A 2B

Reddit r/LocalLLaMA 23h ago

Please stop using AI for posts and showcasing your completely vibe coded projects

I get AI assisted coding, and yes I have AI ASSIST me. It gets to a point though, because I can't come on here without seeing a fully AI coded project, on that

Reddit r/LocalLLaMA 1d ago

common/gemma4 : handle parsing edge cases by aldehir · Pull Request #21760 · ggml-org/llama.cpp

<img src="https://external-preview.redd.it/AFKOsyM14-BoNyle1h8fdZKUcRhqF7sBtlflzzvk3pE.png?width=640&crop=smart&auto=webp&s=f6098a5ccced3d7c78c6e6c5

Reddit r/LocalLLaMA 1d ago

PSA: Having issues with Qwen3.5 overthinking? Give it a tool, and it can help dramatically.

I'm sure everyone has seen the posts from people talking about Qwen 3.5 over-thinking, or maybe you've experienced it yourself. Considering we're like 2 months

Reddit r/LocalLLaMA 1d ago

Trained a 125M LM from scratch instead of fine-tuning GPT-2 — releasing weights + SFT framework for others to build on

Trained a 125M LM from scratch (custom tokenizer) + released instruct checkpoint and SFT framework so others can fine-tune their own variants I’ve been experime

Reddit r/LocalLLaMA 1d ago

Best Local LLMs - Apr 2026

We're back with another Best Local LLMs Megathread! We have continued feasting in the months since the previous thread with the much anticipated release of the

Reddit r/LocalLLaMA 1d ago

Ram-air setup and window vent for 1100w capable AI box

So i have a very powerful setup here and i got tired

Reddit r/LocalLLaMA 1d ago

Follow up post, decided to build the 2x RTX PRO 6000 tower.

Decided to put the effort in and merge my two

Reddit r/LocalLLaMA 1d ago

Is there anything better than Qwen3.5-27B-UD-Q5_K_XL for coding?

I have a 5090, so my VRAM is limited to 32GB, but i find that Qwen3.5-27B-UD-Q5_K_XL with opencode (and mmproj) does a pretty good job for my use case (mainly w