1
0
mirror of https://github.com/kremalicious/blog.git synced 2024-11-22 01:46:51 +01:00

block all the ai bots

This commit is contained in:
Matthias Kretschmann 2023-10-06 11:59:26 +01:00
parent 01d2afa000
commit 43e510ba1d
Signed by: m
GPG Key ID: 606EEEF3C479A91F
3 changed files with 33 additions and 1 deletions

View File

@ -0,0 +1,14 @@
---
date: 2023-10-06T10:22:16.581Z
title: Block the Bots that Feed “AI” Models by Scraping Your Website
linkurl: https://neil-clarke.com/block-the-bots-that-feed-ai-models-by-scraping-your-website/
tags:
- development
- ai
---
Neil Clarke with an [excellent overview](https://neil-clarke.com/block-the-bots-that-feed-ai-models-by-scraping-your-website/) about current techniques to block all the "AI" web scraping bots from your content, e.g. via `robots.txt`. The reasons for doing so are numerous:
> “AI” companies think that we should have to opt-out of data-scraping bots that take our work to train their products. [...] These companies should be prevented from using data that they havent been given explicit consent for. Opt-out is problematic as it counts on concerned parties hearing about new or modified bots BEFORE their sites are targeted by them. That is simply not practical.[...] The online community is under no responsibility to help them create their products.

View File

@ -1,7 +1,24 @@
# https://platform.openai.com/docs/gptbot
User-agent: CCBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: GPTBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Omgilibot
Disallow: /
User-agent: Omgili
Disallow: /
User-agent: FacebookBot
Disallow: /
User-agent: *
Disallow: /search/
Disallow: /page/

View File

@ -78,6 +78,7 @@ const faviconSvg = await getImage({ src: faviconSvgSrc, format: 'svg' })
<meta name="theme-color" content="" />
<meta name="msapplication-TileColor" content="" />
<meta name="robots" content="noai, noimageai" />
<link rel="icon" href="/favicon.ico" sizes="32x32" />
<link rel="icon" href={faviconSvg.src} type="image/svg+xml" />