mirror of
https://github.com/kremalicious/blog.git
synced 2024-12-23 01:30:01 +01:00
15 lines
999 B
Markdown
15 lines
999 B
Markdown
|
---
|
|||
|
date: 2023-10-06T10:22:16.581Z
|
|||
|
|
|||
|
title: Block the Bots that Feed “AI” Models by Scraping Your Website
|
|||
|
linkurl: https://neil-clarke.com/block-the-bots-that-feed-ai-models-by-scraping-your-website/
|
|||
|
|
|||
|
tags:
|
|||
|
- development
|
|||
|
- ai
|
|||
|
---
|
|||
|
|
|||
|
Neil Clarke with an [excellent overview](https://neil-clarke.com/block-the-bots-that-feed-ai-models-by-scraping-your-website/) about current techniques to block all the "AI" web scraping bots from your content, e.g. via `robots.txt`. The reasons for doing so are numerous:
|
|||
|
|
|||
|
> “AI” companies think that we should have to opt-out of data-scraping bots that take our work to train their products. [...] These companies should be prevented from using data that they haven’t been given explicit consent for. Opt-out is problematic as it counts on concerned parties hearing about new or modified bots BEFORE their sites are targeted by them. That is simply not practical.[...] The online community is under no responsibility to help them create their products.
|