Hey DataHoarders,

About a year ago, after writing separate scripts for a few different sites and feeling bummed about how inefficient this was to scale, I set out to build one master adult video scraper to rule them all. One that would be flexible enough to allow new sites with entirely different structures to be added, without touching the code. And now, after countless late nights, I’m excited to share a project I’ve poured my heart into but can’t exactly put on my LinkedIn: Smutscrape!

Smutscrape scrapes videos with metadata from a rapidly growing list of sites—14 presently, including PornHub, XNXX, xHamster, Xvideos, SpankBang… …and, perhaps revealing more about my specific kinks than anyone would have dared ask: IncestFlix/IncestGuru, FamilyPornTV, FamilyPornHD, Family-Sex, 9Vids, and Motherless.

You can of course grab individual videos, but the real magic happens when you set it to scrape all videos from a set—whether that’s a specific tag, search query, performer, studio, channel, user’s uploads, playlist, etc. Whatever the mode, it gathers relevant metadata as it goes, pulling them into .NFO files alongside the videos for richer information in your media manager of choice (and particularly Stash, Jellyfin, or Plex).

It’ll put the files wherever you like—local filesystem, SMB, or WebDAV share—while respecting the policy you set for handling filename collisions (won’t overwrite anything by default), remembering each successfully scraped URL and avoiding checking that URL next time it encounters it (unless you tell it to scrape for new metadata), and optionally even rotating VPN exit nodes for you on a set interval if that’s how you roll. (which I respect.)

It’s written in Python, configured and customized via one main YAML configuration file, and then relies on separate YAML configurations for each supported site. If you’re familiar with the basic principles of scraping—just identifying CSS selectors for desired elements, really—it’s dead simple to add sites.

The interface is something I’m particularly proud of. Run it without arguments and you’ll get a beautiful terminal output with all supported sites, their modes, and examples. Run it with just the site argument and you’ll get a more detailed breakdown of the site featuring the download method, whether Selenium is required, and for some sites (more soon): my own curated notes covering quirks, coverage gaps, modes unique to the site, and more, all generated on the fly from the site configurations in ./sites/

Check it out on GitHub: https://github.com/io-flux/smutscrape