Running self-hosted LLMs can easily turn into an endless upgrade cycle. One week you are testing new quants, the next week you are comparing benchmarks, context lengths, and VRAM usage, all hoping the next model will finally make your setup feel reliable. I went through the exact same phase. Every problem felt like a hardware or model-quality problem waiting to be solved with a bigger download.