• Mike1576218@lemmy.ml
    link
    fedilink
    arrow-up
    0
    ·
    5 months ago

    llama2 gguf with 2bit quantisation only needs ~5gb vram. 8bits need >9gb. Anything inbetween is possible. There are even 1.5bit and even 1bit options (not gguf AFAIK). Generally fewer bits means worse results though.