Most of what I do with local LLMs is text-based work like research, brainstorming, summarization, and the occasional fact-checking. Some of the models I run have vision and that’s been genuinely useful too, especially for a***yzing UI screenshots and working through layout ideas, but the vision side just reads images and gives you back text, it doesn’t generate anything visual.