Hey folks!
I am looking for feedback from active lemm.ee users on what you all value when it comes to images on Lemmy. I’ll go into a bit of detail about what our options are, and then I would ask you to voice your opinion about the issue in the comments.
First, some context for those who don’t know. Lemmy software can be configured to handle images in three different ways:
- Store images locally - whenever an external image is posted somewhere, lemm.ee will download a permanent local copy. When you view posts, you are seeing our local copy of the image.
- Proxy all images - similarly to the first option, lemm.ee will download a local copy of external images, however, this copy is temporary. It will be automatically deleted shortly after, and if users open the relevant post/comment again in the future, there will be another attempt to download a temporary copy at that point.
- Pass through external images directly - lemm.ee never downloads any external images, users will always connect directly to the source servers to load the images.
There are pros and cons to each configuration.
Storing images locally
Benefits:
- Your IP address is never leaked to external image hosts, as you never connect directly to the source server. External image hosts only see the IP address of the lemm.ee server.
- External servers don’t become bottlenecks for opening lemm.ee posts. If an external server is slow, it won’t matter, because the image is always available locally
Downsides:
- As time goes on, our storage will fill up with hundreds of gigabytes of useless images, most of which will never be viewed again after the relevant posts fall off the front page.
- Many big external image hosts will rate limit bigger Lemmy servers, causing broken images when we fail to make a local copy.
- Crucially: some people love to spend their time uploading illegal content to online servers. There are tools to try and filter out such content, but these are not perfect. The end result is that there is a high chance of some content like this inadvertently reaching lemm.ee storage and staying there permanently. This downside is why lemm.ee has not, and will not, use this particular configuration.
Proxying images
Benefits: In addition to the same benefits as exist for the permanent local storage, by only temporarily making local copies for the moment they are requested by our users, we free up a ton of storage & remove the risk of permanently storing illegal content on our servers.
Downsides: The key downside is that external rate limits hit us much harder, as we will be requesting external images far more often. This results in a lot of constant broken images on lemm.ee.
Passing through external images
Benefits:
- Images are rarely broken, unless the source server goes down.
- The images never touch our servers, removing a lot of risk with illegal content as well as with storage costs.
Downsides:
- Our users lose a degree of privacy. Every external image that is loaded on your browser will result in the remote server getting a request directly from your computer to fetch that image - this is pretty much the same as you had visited that external server directly, which lets them log your IP address if they wish.
- When remote servers are slow, it can slow down the entire page load in some cases.
Current situation
Initially, lemm.ee was using the third option of passing through images. Ever since support for option 2, image proxying, was implemented in Lemmy code, we immediately switched to that option, mainly for the privacy benefits. However, after many months, and being blocked by more and more external servers, it is clear that image proxying is seriously degrading the user experience on lemm.ee. We often end up with broken images, and our users have to deal with the results.
I still believe image proxying is a really valuable feature, but I am starting to believe it is a better fit for small instances which make much less requests to external servers.
As a result, I am now seriously considering switching back to the previous method of passing through external images.
This is where you come in - I would ask you as users to please let me know which do you value more: the privacy that you get from image proxying, or the better user experience you get from directly passing through images from their source. Please let me know in the comments how you feel. If I get enough feedback about people being against image proxying, then I will be switching it off for lemm.ee soon. Thanks for reading & sharing your thoughs, and I hope you have a great weekend!
Thanks for your hard work as always.
I’m in favor of moving away from proxying. Too many images break and proxying in general is very wasteful, having to download images from potentially small servers constantly would definitely get you ratelimited.
Passing through external images is OK. Many people often post external links anyways to sites like imgur and catbox because of the file size limits anyways.
I think the end goal would always to store images locally though. Don’t large instances like Lemmy World and huge Mastodon instances work this way? How do they manage the risk?
I’d prefer proxying
I’m on a lot, and I scroll from All/Hot. I rarely see broken images. They do pop up, but not enough to ever bother me. The only option I’d avoid is method 1, because of that image debacle a few months ago. Regarding methods 2 and 3, they both seemed to work fine. I leave it to smarter minds than myself.
Good work on next, btw. Enjoy your weekend and thank you for everything you do!
I think proxying is very important else anyone can simply upload an iplogger or possibly more advanced fingerprinting image. This in combination with observing federated actions will make it very easy to deanonimise almost every single user who interacts in any way even upvoting.
Can u simply increase the time period that nginx caches images for to avoid some of the rate limiting issues? Otherwise perhaps using proxy lists to proxy requests from lemm.ee to the image hosts is doable (im not sure about the legality of this tho).
Have u emailed the image hosts letting them know what u do and asking if they can remove ur rate limit (idk if they would be receptive to this without a financial incentive).
No opinion for the moment, but thank you for the very detailed post
I would pick image pass through because I’m not necessarily concerned about the image hoster logging my IP. I have been more frustrated with broken links, so I am very much opposed to the current method.
Images have been a bit problematic for me lately, for sure. If storing them locally is not a solid option, the question I would have is how much of the other requests are proxied? As in, what other stuff apart from images/media is not being proxied? If the clients are leaking IPs anyway, maybe it’s okay to have them download the images, too. But if the server is proxying everything else then having some sort of a cache might not be that bad an idea
hmm maybe proxy or store it on a external dns
Option 3.
Option 3, agree with others, there are other privacy focused instances.
I am also in favor of option 3. Hosting images has a chance of hosting some csam and that might take you out.
Not a lemm.ee user, but here’s my thoughts on #2 since it affects me via federation:
I am not a fan of how Lemmy chose to implement image proxying Specifically, federating the proxied URL.
That frequently prevents my instance from fetching a thumbnail locally (option 1 above). Which, ironically, increases the load on your server as my instance has to fetch it from your proxy every time instead of just once to generate a local copy here.
From a UI development standpoint, the proxied thumbnail URLs also make it harder to detect the image type (gif, static image, video) to handle rendering. It also complicates other proxying/caching methods I have in place. Ultimately, in the UI I develop, I’ve had to resort to passing thumbnail images through a function to un-proxy them so they can be handled sanely.
So I generally wish that admins avoid Lemmy’s proxying until it no longer federates the proxied URL and does something sane like just return that for the local API calls.
Wow didnt know they federated the proxied image kinda stupid ngl.
We really need some sort of distributed content hosting for images that allows everything to have a single unique address servable by anyone. Perhaps a bittorrent that has all federated media.Can still have the address to the media be a url for the local instance as not to break frontends but backends could recognise it as universal bittorrent resource and fetch it in a distributed manner.
Would also mean clients can implement their own retrieval as not to rely on the server but that wouldnt be required.
I suppose u could also put websites content into the same system as a sort of archive. Make the fediverse more p2p distribute load to more smaller nodes improving resiliency.
Anyone know how peertube has done their bitorrent implementation?
Storing permanently locally doesn’t sound like a good solution…
If you could adjust the length of time to keep cached/proxy’d images locally and increase it significantly, I’d think that would be the preferred solution.
The various image hosting sites that people choose being down or otherwise dysfunctional seems more common than you’d think. One that’s quite popular lately just flatly blocks all VPN and Tor users, leading to many broken images for some of us.
It’s too bad that the image cache you have stores things permanently. Having them expire after six hours or something would seem like a better option. Maybe somehow route it through a normal caching proxy instead of the built-in lemmy one?
I believe the normal lemmy one is nginx which is reasonable configurable but anything else can be used.
Option 3 is the only one that seems sustainable long term. Donations will NEVER keep up with user growth, thus storage costs will balloon out of control.
Completely avoiding any chance of illegal content touching the servers should immediately have everyone agreeing on this option. I doubt anyone here is willing to foot legal bills and as such even minor legal actions would be the end of this instance.
Privacy is nice but ip logging is the simplest form to “protect” against with even a free VPN. If those claiming privacy concerns here aren’t already using a VPN and are depending purely on lemme.ee’s proxy then their internet hygiene needs an update.
As for usability, the image being deleted from external provider presents the same issue to the user between option 2 and 3. The cache from option 2 will inventually get cleared and it’ll fail to pull a fresh copy if deleted from the external hosts.