makeasnek@lemmy.ml to AI@lemmy.mlEnglish · 5 months agoLLM ASICs on USB sticks?lemmy.mlexternal-linkmessage-square13fedilinkarrow-up11arrow-down10file-text
arrow-up11arrow-down1external-linkLLM ASICs on USB sticks?lemmy.mlmakeasnek@lemmy.ml to AI@lemmy.mlEnglish · 5 months agomessage-square13fedilinkfile-text
Source: nostr https://snort.social/nevent1qqsg9c49el0uvn262eq8j3ukqx5jvxzrgcvajcxp23dgru3acfsjqdgzyprqcf0xst760qet2tglytfay2e3wmvh9asdehpjztkceyh0s5r9cqcyqqqqqqgt7uh3n Paper: https://arxiv.org/abs/2406.02528
minus-squareMike1576218@lemmy.mllinkfedilinkarrow-up0·5 months agollama2 gguf with 2bit quantisation only needs ~5gb vram. 8bits need >9gb. Anything inbetween is possible. There are even 1.5bit and even 1bit options (not gguf AFAIK). Generally fewer bits means worse results though.
llama2 gguf with 2bit quantisation only needs ~5gb vram. 8bits need >9gb. Anything inbetween is possible. There are even 1.5bit and even 1bit options (not gguf AFAIK). Generally fewer bits means worse results though.