1
0
mirror of https://github.com/kremalicious/blog.git synced 2024-11-22 18:00:06 +01:00
blog/content/links/2023-10-06-block-the-bots-that-feed-ai-models-by-scraping-your-website.md

999 B
Raw Blame History

date title linkurl tags
2023-10-06T10:22:16.581Z Block the Bots that Feed “AI” Models by Scraping Your Website https://neil-clarke.com/block-the-bots-that-feed-ai-models-by-scraping-your-website/
development
ai

Neil Clarke with an excellent overview about current techniques to block all the "AI" web scraping bots from your content, e.g. via robots.txt. The reasons for doing so are numerous:

“AI” companies think that we should have to opt-out of data-scraping bots that take our work to train their products. [...] These companies should be prevented from using data that they havent been given explicit consent for. Opt-out is problematic as it counts on concerned parties hearing about new or modified bots BEFORE their sites are targeted by them. That is simply not practical.[...] The online community is under no responsibility to help them create their products.