1
0
mirror of https://github.com/kremalicious/blog.git synced 2024-11-22 18:00:06 +01:00
blog/content/links/2023-10-06-block-the-bots-that-feed-ai-models-by-scraping-your-website.md

15 lines
999 B
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
date: 2023-10-06T10:22:16.581Z
title: Block the Bots that Feed “AI” Models by Scraping Your Website
linkurl: https://neil-clarke.com/block-the-bots-that-feed-ai-models-by-scraping-your-website/
tags:
- development
- ai
---
Neil Clarke with an [excellent overview](https://neil-clarke.com/block-the-bots-that-feed-ai-models-by-scraping-your-website/) about current techniques to block all the "AI" web scraping bots from your content, e.g. via `robots.txt`. The reasons for doing so are numerous:
> “AI” companies think that we should have to opt-out of data-scraping bots that take our work to train their products. [...] These companies should be prevented from using data that they havent been given explicit consent for. Opt-out is problematic as it counts on concerned parties hearing about new or modified bots BEFORE their sites are targeted by them. That is simply not practical.[...] The online community is under no responsibility to help them create their products.