Unlike cloud-based AI models, locally-hosted large language models are infamous for their sky-high system requirements, with the more powerful ones requiring plenty of tensor cores and ample VRAM. Although I’d argue that with MoE offloading, Mixture of Experts models can run even on ancient systems, you’ll still need a discrete graphics card to run these bulky LLMs.