I’ve read multiple times that CUDA dominates, mostly because NVIDIA dominates. Rocm is the AMD equivalent, but OpenCL also exists. From my understanding, these are technologies used to program graphics cards - always thought that shaders were used for that.
There is a huge gap in my knowledge and understanding about this, so I’d appreciate somebody laying this out for me. I could ask an LLM and be misguided, but I’d rather not 🤣
AFIK it’s only NVIDIA that allows containers shared access to a GPU on the host.
With the majority of code being deployed in containers, you end up locked into the NVIDIA ecosystem even if you use OpenCL. So I guess people just use CUDA since they are limited by the container requirement anyways.
That’s from my experience using OpenGL headless. If I’m wrong please correct me; I’d prefer being GPU agnostic.
Check implementations before saying shit like that. Nvidia has historical bad open source driver support, which makes it hard for people to implement vGPU usage. They actually actively blocked us from using their cards remotely, until COVID hit. Then they gave out the code to do it. They are still limiting customer level cards usage on virtualization cases. They had to give out a toolkit for us to be able to use their cards on docker. Other cards can be accessed just by sharing dev driver files to the volume.
Can you share sample code I can try or documentation I can follow of using an AMD GPU in that way (shared, virtualized, using only open source drivers)?
Check Wolf (in my other comment), it’s the best example of GPU virtualization usage.
Otherwise you can check other docker images using GPU for computing, like jellyfin for instance, or nextcloud recognize, nextcloud memories and its transcoding instance,…
This cannot be right. I’m pretty sure that it is possible to run OpenCL applications in containers that are sharing a GPU.
I should test this if I have time. My plan was to use a distrobox container since that shares the GPU by default and run something like lc0 to see if opencl acceleration works.
Now where is my remindme bot? (I won’t have time).
You really piqued my interest. I use docker/podman.
W/ an AMD graphics card, eglinfo on the host shows the card is AMD Radeon and driver is matching that.
In the container, without --gpus=all, it shows the card is unknown and the driver is “swrast” (so just CPU fallback).
To make --gpus=all work, it gives the error
I was doing a bad job searching before. I found that AMD can share the GPU, it just works a little differently in terms of how to launch the container. https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/amdgpu-install.html#amdgpu-install-dkms
But sadly my AMD GPU is too old/junk to have current driver support.
Anyways, appreciate the reply! Now I can mod my code to run on cheaper cloud instances.
(Note I’m an OpenGL/3D app developer, but probably OpenCL works about the same architecturally)