With most x86 devices in my a***nal already engaged in DIY projects, I’ve gone back to experimenting with ARM boards. In fact, I’ve been running a bunch of lightweight LLMs on my single-board computers, and they’re surprisingly decent at running sub-4B models. Toss them in a cluster, and they can even handle the likes of 9B LLMs (provided you’re willing to overlook the abysmally low token generation rates).