1
0
mirror of https://github.com/kremalicious/blog.git synced 2024-11-22 18:00:06 +01:00
blog/content/links/2023-10-06-block-the-bots-that-feed-ai-models-by-scraping-your-website.md

15 lines
999 B
Markdown
Raw Normal View History

2023-10-06 12:59:26 +02:00
---
date: 2023-10-06T10:22:16.581Z
title: Block the Bots that Feed “AI” Models by Scraping Your Website
linkurl: https://neil-clarke.com/block-the-bots-that-feed-ai-models-by-scraping-your-website/
tags:
- development
- ai
---
Neil Clarke with an [excellent overview](https://neil-clarke.com/block-the-bots-that-feed-ai-models-by-scraping-your-website/) about current techniques to block all the "AI" web scraping bots from your content, e.g. via `robots.txt`. The reasons for doing so are numerous:
> “AI” companies think that we should have to opt-out of data-scraping bots that take our work to train their products. [...] These companies should be prevented from using data that they havent been given explicit consent for. Opt-out is problematic as it counts on concerned parties hearing about new or modified bots BEFORE their sites are targeted by them. That is simply not practical.[...] The online community is under no responsibility to help them create their products.