Business · The Register
Usage-based pricing killing your vibe - here's how to roll your own local AI coding agents
Compiled by KHAO Editorial — aggregated from 1 outlet. See llms.txt for citation guidance.
◌ Single Source
With model devs pushing more aggressive rate limits, raising prices, or even abandoning subscriptions for usage-based pricing, that vibe-coded hobby project is about to get a whole lot more expensive.
Key facts
- It so happens that Alibaba recently dropped Qwen3.6-27B, which the cloud and e-commerce giant boasts packs "flagship coding power" into a package small enough to run on a 32 GB M-series Mac or 24 GB
- Qwen3.6-27B supports a 262,144 token context window, but unless you have a high-end Mac or a workstation GPU, you probably don't have enough memory to take advantage of all of that, at least not
- With all that out of the way, here's the launch command they're using for a 24GB Nvidia RTX 3090 TI, but the same code command should work fine if you're using an AMD or Intel GPU or are running Llama
- If you're planning on running Llama.cpp and accessing it on another machine, you'll also want to add --host 0.0.0.0 to the command, which will expose it to your local area network
Summary
Over the past few weeks, they've seen Anthropic toy with dropping Claude Code from its most affordable plans while Microsoft has skipped testing the waters and moved GitHub Copilot to a purely usage-based model. Do they even need Anthropic or OpenAI's top models, or can they get away with a smaller local model? It so happens that Alibaba recently dropped Qwen3.6-27B, which the cloud and e-commerce giant boasts packs "flagship coding power" into a package small enough to run on a 32 GB M-series Mac or 24 GB GPU. At the time, the models and software stack were immature, making them useful tools, but not necessarily good enough to compete with larger frontier models.