• JackGreenEarth@lemm.ee
    link
    fedilink
    English
    arrow-up
    0
    ·
    5 months ago

    I only need ~4 GB of RAM/VRAM for a 7B model, my GPU only has 6GB VRAM anyway. 7B models are smaller than you think, or you have a very inefficient setup.

    • Fisch@discuss.tchncs.de
      link
      fedilink
      English
      arrow-up
      0
      ·
      5 months ago

      That’s weird, maybe I actually am doing something wrong. Is it because I’m using GGUF models maybe?

      • Mike1576218@lemmy.ml
        link
        fedilink
        arrow-up
        0
        ·
        5 months ago

        llama2 gguf with 2bit quantisation only needs ~5gb vram. 8bits need >9gb. Anything inbetween is possible. There are even 1.5bit and even 1bit options (not gguf AFAIK). Generally fewer bits means worse results though.