• mozz@mbin.grits.dev
      link
      fedilink
      arrow-up
      0
      ·
      5 months ago

      Simple and just works

          def fetch_html(self, url):
              domain = urllib.parse.urlparse(url).netloc
              if domain not in self.robot_parsers:
                  rp = urllib.robotparser.RobotFileParser()
                  rp.set_url(f'https://{domain}/robots.txt')
                  rp.read()
                  self.robot_parsers[domain] = rp
      
              rp = self.robot_parsers[domain]
              if not rp.can_fetch(self.user_agent, url):
                  print(f"Fetching not allowed by robots.txt: {url}")
                  return None
      
              if self.last_fetch_time:
                  time_since_last_fetch = time.time() - self.last_fetch_time
                  if time_since_last_fetch < self.delay:
                      time.sleep(self.delay - time_since_last_fetch)
      
              headers = {'User-Agent': self.user_agent}
              response = requests.get(url, headers=headers)
              self.last_fetch_time = time.time()
      
              if response.status_code == 200:
                  return response.text
              else:
                  print(f"Failed to fetch {url}: {response.status_code}")
                  return None
      

      Randomly selected something from a project I’m working on that’s simple and just works. Show me less than 300 lines of .NET to do the same, and I would be somewhat surprised.

        • mozz@mbin.grits.dev
          link
          fedilink
          arrow-up
          0
          ·
          5 months ago

          I am familiar.

          Not saying don’t pay your bills with it; that part sounds great. I was just confused by this guy’s enthusiasm for it, that’s all.

      • ShortFuse@lemmy.world
        link
        fedilink
        arrow-up
        0
        ·
        5 months ago

        You left out the hundred of lines from the library you’re importing. Where’s all the code for robotparser?

        You can import libraries with C# too. That says nothing about the differences between languages.

        • mozz@mbin.grits.dev
          link
          fedilink
          arrow-up
          0
          ·
          5 months ago

          That’s exactly my point though - part of my assertion of a big weakness in C# would be that more mainstream languages (python or node) have massive libraries you can draw on with existing code for simple stuff like parsing robots.txt, whereas C# has one that probably seems pretty luxurious if you’re comparing it to nothing, but is well short of what OSS programmers are accustomed to.

          So yeah it’s not a purely fair language-design comparison but it’s a perfectly fair “how easy is it to get stuff done in this language” comparison. And then at a certain point it starts to become not just a convenience but a whole new area of computation (something like numpy or pytorch) that’s simply impossible in C# without a whole research project devoted to it to implement. That said, I’m sure there are areas (esp in heavily business-oriented fields like airline or medical backend or whatnot) where it’s the other way around, of course, and you have C#-specific stuff for that domain that would be real difficult to replicate in some other environment. I’m not trying to say that side doesn’t exist, just saying what’s generally applicable to my experience.

          So I’m not like being critical of C# because of language features (it seems perfectly fine and functional; I get what the people are saying who say they get work done every day in it and it seems fine.) But also, I think it’s relevant that it’s missing some big advantages if you’re trying to go beyond the “it doesn’t actively punish you for using it” stage.