When it comes to local LLMs, we have been told that if you aren’t packing a high-end GPU with a massive pool of VRAM, you are stuck with sluggish response times or ‘out of memory’ errors.