fattyfoods@feddit.nl to Open Source@lemmy.ml · 18 days agoThe Open-Source Software Saving the Internet From AI Bot Scraperswww.404media.coexternal-linkmessage-square91linkfedilinkarrow-up1541arrow-down19
arrow-up1532arrow-down1external-linkThe Open-Source Software Saving the Internet From AI Bot Scraperswww.404media.cofattyfoods@feddit.nl to Open Source@lemmy.ml · 18 days agomessage-square91linkfedilink
minus-squaremedem@lemmy.wtflinkfedilinkarrow-up23arrow-down2·18 days ago<Stupidquestion> What advantage does this software provide over simply banning bots via robots.txt? </Stupidquestion>
minus-squarekcweller@feddit.nllinkfedilinkarrow-up73·18 days agoRobots.txt expects that the client is respecting the rules, for instance, marking that they are a scraper. AI scrapers don’t respect this trust, and thus robots.txt is meaningless.
minus-squaremedem@lemmy.wtflinkfedilinkarrow-up42·18 days agoWell, now that y’all put it that way, I think it was pretty naive from me to think that these companies, whose business model is basically theft, would honour a lousy robots.txt file…
minus-squarePlantPowerPhysicist@discuss.tchncs.delinkfedilinkarrow-up25·18 days agothe scrapers ignore robots.txt. It doesn’t really ban them - it just asks them not to access things, but they are programmed by assholes.
minus-squarethingsiplay@beehaw.orglinkfedilinkarrow-up13·17 days agoThe difference is: robots.txt is a promise without a door Anubis is a physical closed door, that opens up after some time
minus-squareMwa@thelemmy.clublinkfedilinkEnglisharrow-up8·18 days agoThe problem is Ai doesn’t follow robots.txt,so Cloudflare are Anubis developed a solution.
minus-squareoong3Eepa1ae1tahJozoosuu@lemmy.worldlinkfedilinkEnglisharrow-up2arrow-down5·18 days agoI mean, you could have read the article before asking, it’s literally in there…
<Stupidquestion>
What advantage does this software provide over simply banning bots via robots.txt?
</Stupidquestion>
Robots.txt expects that the client is respecting the rules, for instance, marking that they are a scraper.
AI scrapers don’t respect this trust, and thus robots.txt is meaningless.
Well, now that y’all put it that way, I think it was pretty naive from me to think that these companies, whose business model is basically theft, would honour a lousy robots.txt file…
the scrapers ignore robots.txt. It doesn’t really ban them - it just asks them not to access things, but they are programmed by assholes.
The difference is:
The problem is Ai doesn’t follow robots.txt,so Cloudflare are Anubis developed a solution.
I mean, you could have read the article before asking, it’s literally in there…