Since I like having more than one local LLM to switch between when analysing tricky development issues I decided to try out this new MoE model today. It’s a 30B A3B which means it’s basically a drop-in replacement for Qwen 3.6 35B A3B with suitable llama.cpp parameters the same.

On their own published benchmark metrics it’s supposed to be slightly worse than Qwen, but so far it’s not something I’ve noticed. It’s tuned to work well in Opencode which is how I’m running it as well.

Try it out, see how it works for you. I know that there are those who would rather use a Canadian than Chinese model in today’s political climate and it does seem to perform better than Gemma 4 at least for me. Just don’t forget to use the PR linked from unsloth’s description until it has been merged into main.

  • e0qdk@reddthat.com
    link
    fedilink
    English
    arrow-up
    0
    ·
    11 days ago

    Interesting. Looks like I’d need to build a special llama.cpp to get it to run on my system currently, and I think I could get lost for a long time if I start digging up that rabbit hole… so maybe not today, but I’ll keep an eye out and give it a try if support lands in main.

    Is it doing any better than Qwen at avoiding getting stuck in thinking loops?