Local AI coding models have been growing in popularity, and for good reason. Between Qwen3 Coder Next, Devstral 2, gpt-oss-120b, and a wave of smaller fine-tuned models, the pitch is that you can run a legitimate coding a**istant on your own hardware, without a cloud subscription. I’ve been running local models for a while now, and I genuinely believe they’ve reached a point where they’re useful for real work. But “useful” is a wide spectrum; reverse engineering and static a***ysis is something I find them useful for, but what about good old coding? I wanted to find where some of the most popular models fall on that spectrum… so I designed a test.