• Fisch@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    0
    ·
    4 months ago

    That’s weird, maybe I actually am doing something wrong. Is it because I’m using GGUF models maybe?

    • Mike1576218@lemmy.ml
      link
      fedilink
      arrow-up
      0
      ·
      4 months ago

      llama2 gguf with 2bit quantisation only needs ~5gb vram. 8bits need >9gb. Anything inbetween is possible. There are even 1.5bit and even 1bit options (not gguf AFAIK). Generally fewer bits means worse results though.