• The Hobbyist@lemmy.zip
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    4 天前

    Ollama, latest version. I have it setup with Open-WebUI (though that shouldn’t matter). The 14B is around 9GB, which easily fits in the 12GB.

    I’m repeating the 28 t/s from memory, but even if I’m wrong it’s easily above 20.

    Specifically, I’m running this model: https://ollama.com/library/deepseek-r1:14b-qwen-distill-q4_K_M

    Edit: I confirmed I do get 27.9 t/s, using default ollama settings.

    • Viri4thus@feddit.org
      link
      fedilink
      English
      arrow-up
      2
      ·
      4 天前

      Ty. I’ll try ollama with the Q-4-M quantization. I wouldn’t expect to see a difference between ollama and SGlang.

    • Jeena@piefed.jeena.net
      link
      fedilink
      English
      arrow-up
      2
      ·
      4 天前

      Thanks for the additional information, that helped me to decide to get the 3060 12G instead of the 4060 8G. They have almost the same price but from what I gather when it comes to my use cases the 3060 12G seems to fit better even though it is a generation older. The memory bus is wider and it has more VRAM. Both video editing and the smaller LLMs should be working well enough.