We backed up Spotify (metadata and music files). It’s distributed in bulk torrents (~300TB), grouped by popularity.

This release includes the largest publicly available music metadata database with 256 million tracks and 186 million unique ISRCs.

It’s the world’s first “preservation archive” for music which is fully open (meaning it can easily be mirrored by anyone with enough disk space), with 86 million music files, representing around 99.6% of listens.

  • Zombie@feddit.uk
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 month ago

    After Meta scraped all their books they have the perfect defense now. All they have to say is “we’re training a music AI” and they’re apparently untouchable.

    • Signtist@bookwyr.me
      link
      fedilink
      English
      arrow-up
      0
      ·
      30 days ago

      Well, they have to say “we’re training a music AI” while slipping several million dollars into the pockets of the right people. Rich people don’t win legal battles by actually proving what they did isn’t illegal, they do it by discreetly paying people to say they did.

    • slappyfuck@lemmy.ca
      link
      fedilink
      English
      arrow-up
      0
      ·
      29 days ago

      Omg if my girlfriend had $10k to give me for a server I could buy like a RAM or two!

    • Lyra_Lycan@lemmy.blahaj.zone
      link
      fedilink
      English
      arrow-up
      0
      ·
      30 days ago

      Step 1: Buy £6,000 worth of identical hard drives and a motherboard with 16 SATA ports. Or £12,000 worth and a RAID 1 server rig. Or £24,000 and RAID 6

      • Revan343@lemmy.ca
        link
        fedilink
        English
        arrow-up
        0
        ·
        30 days ago

        You can get PCI cards to add more sata ports, they don’t all need to be on the motherboard

        • Strawberry@lemmy.blahaj.zone
          link
          fedilink
          English
          arrow-up
          0
          ·
          30 days ago

          iirc her writings gave me the impression that she basically worships communism and famous soviet communist leaders in a vaguely religious way. I can go look again later and elaborate. It could just be the language barrier that gave me that impression.

          • mirshafie@europe.pub
            link
            fedilink
            English
            arrow-up
            0
            ·
            30 days ago

            Down with the bourgeoisie

            Eat the rich

            Sodomize the land-owners

            Impale all people who have more than 25 reál in their pocket

            Literally murder all human beings regardless of their political beliefs

            • Strawberry@lemmy.blahaj.zone
              link
              fedilink
              English
              arrow-up
              0
              ·
              28 days ago

              I went to look for a link for you and found that I was misremembering. I was thinking of Alexandra Elbakyan, the creator of Sci-Hub, one of the libraries that Anna’s Archive indexes. In case you’re curious about her, here’s an archive of her personal page on Sci-Hub

        • skisnow@lemmy.ca
          link
          fedilink
          English
          arrow-up
          0
          ·
          1 month ago

          Yeah as with most of the internet, it’s only worth downloading anything uploaded before 2023.

          So far, LLMs have done so much more harm than help.

        • nagaram@startrek.website
          link
          fedilink
          English
          arrow-up
          0
          ·
          1 month ago

          I’m not convinced AI slop can compete with the back log of organic slop personally.

          But yeah a fuckton is probably slop either way

          • bear@lemmy.blahaj.zone
            link
            fedilink
            English
            arrow-up
            0
            ·
            30 days ago

            AI slop is accelerating exponentially for the foreseeable future. It won’t take long for world data storage to be a limiting factor.

        • Hawk@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          0
          ·
          1 month ago

          Interestingly enough, with the data they provide, figuring out how much of it is AI slop wouldn’t be that hard I think

          • infinitesunrise@slrpnk.net
            link
            fedilink
            English
            arrow-up
            0
            ·
            1 month ago

            A RAID6 of 24 * 20TB drives could contain that with both parity and hotswap, with room to spare. Let’s say $400 per refurb drive, $2500 rackmount SAS enclosure, $2000 SAS RAID card, $14,100 total. Assuming you already have the server and power and SAS cables.

            • wheezy@lemmy.ml
              link
              fedilink
              English
              arrow-up
              0
              ·
              1 month ago

              You could budget this way down. I run 10+2 12TB with Unraid. No reason for a raid card if it’s for archive and personal use.

              • brognak@lemmy.dbzer0.com
                link
                fedilink
                English
                arrow-up
                0
                ·
                1 month ago

                100% this. People who store easily replaceable media on RAID are just throwing away money (unless you have a need for faster read/write). If it’s your family photos, copy of your in progress thesis, or other irreplaceable piece of info/content go for it.

                I have like 40tb Unraid NAS and I get asked pretty much every time I talk to someone about it how I do backups. Easy, I backup my *arr stack databases and in case of a failure I restore them and let it pull down everything over time. Which I have done in the past when I wanted to upgrade quality, easier for me to scrub it all and start over than make upgrade profiles and such.

                Or that’s what I would have done, now I mostly use DebridService du jour and Stremio :-)

            • oyo@lemmy.zip
              link
              fedilink
              English
              arrow-up
              0
              ·
              1 month ago

              This gif is going to completely lose its punch in a couple years.

            • N0x0n@lemmy.ml
              link
              fedilink
              English
              arrow-up
              0
              ·
              1 month ago

              10US dollar per TB?? 🤣🤣 More like 30/35€ per TB for a good graded HDD!

              Let’s not talk about SSDs or nvme which are more in the 120€/TB.

              I always hear people say that storage comes cheap nowaday… I’m still looking for that cheap HDD on amazon… It has been 10 years 🤣🤣

                • HelloRoot@lemy.lol
                  link
                  fedilink
                  English
                  arrow-up
                  0
                  ·
                  1 month ago

                  US of A often has way lower hdd prices compared to Europe.

                  Take the serverpartdeals price and add shipping and import tax.

    • AnarchistArtificer@slrpnk.net
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 month ago

      They’ve released torrents of the metadata, and they plan to release the music files, but they haven’t yet. They intend to start by offering the downloads as bulk torrents, but they’re open to considering implementing the ability to download single songs in the future.

      So in short, yes, but you can’t download them yet

    • mic_check_one_two@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      0
      ·
      30 days ago

      Not yet, but that’s the end goal. The tricky part is that they’re only offering bulk downloads for now, which means downloading a single artist or album would be difficult/impossible. You’d need to download the entire compressed file of like 300GB of music, then extract the specific songs/artists/albums you wanted. The goal for now is preservation, meaning they want to make the bulk download as easy as possible, to make sure people can preserve it. Once they’ve got that in a pretty good spot, they may look into allowing more granular downloads.

    • Vespair@lemmy.zip
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 month ago

      Spotify claims to offer lossless quality on much of their catalog; is this claim false or is there something more I’m missing here?

      • degen@midwest.social
        link
        fedilink
        English
        arrow-up
        0
        ·
        30 days ago

        That’s for premium accounts, which they probably aren’t scraping with. And I think it’s still not FLAC quality

    • EccTM@lemmy.ml
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 month ago

      They said this in the linked blog post:

      A while ago, we discovered a way to scrape Spotify at scale.

      Seems like reason enough to choose to scrape Spotify to me.

    • Blackmist@feddit.uk
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 month ago

      Spotify has lossless now. Although if you’re listening on anything with Bluetooth then you probably won’t notice anyway.

    • infinitesunrise@slrpnk.net
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 month ago

      Both. Per the SQL schema printed in the article, table track_audio_features has both fields tempo and key along with many other technical. Worth checking out, it’s near the bottom of the page.

      • driving_crooner@lemmy.eco.br
        link
        fedilink
        English
        arrow-up
        0
        ·
        1 month ago

        Would love lmao. Just bought a second hand VDJ and I’m starting to experiment with mixxx, and I don’t know is the style I like (latincore and adjacents) or if the BPM detected of mixxx isn’t that good.

        • Rai@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          0
          ·
          1 month ago

          Good on you for starting that up! I wish you much success in your mixing and/or producing journey!

    • mic_check_one_two@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      0
      ·
      30 days ago

      Yes, and it hasn’t been easy to dig up until recently. There were a few ways to search the “hidden” metadata fields that Spotify uses internally. But it definitely hasn’t been easy or straightforward.

      Those hidden fields are how Spotify recommends similar artists. You have a few bands on repeat with specific instruments, chord progressions, and singer vocal range? Gee, maybe you’ll enjoy other bands that are similar to that…