bear.nolt.io/113

Preview meta tags from the bear.nolt.io website.

Linked Hostnames

4

Thumbnail

Search Engine Appearance

Google

https://bear.nolt.io/113

Block LLMs from scraping by allowing customization of robots.txt · ʕ•ᴥ•ʔ Bear Feedback

I'm more than happy to have by site indexed by search engines such as Google, Bing, Kagi, etc. But I'd love to have a way to block LLMs such as ChatGPT, from training using data from my site. From what I'm aware, the main viable option for this would be robots.txt. page level no-follow tags would run away search engines since there's no way to specify user-agents. Additionally, DNS level scraping products from companies such as Cloudflare only prevent malicious scrapers (e.g. ones looking for emails/phone numbers).



Bing

Block LLMs from scraping by allowing customization of robots.txt · ʕ•ᴥ•ʔ Bear Feedback

https://bear.nolt.io/113

I'm more than happy to have by site indexed by search engines such as Google, Bing, Kagi, etc. But I'd love to have a way to block LLMs such as ChatGPT, from training using data from my site. From what I'm aware, the main viable option for this would be robots.txt. page level no-follow tags would run away search engines since there's no way to specify user-agents. Additionally, DNS level scraping products from companies such as Cloudflare only prevent malicious scrapers (e.g. ones looking for emails/phone numbers).



DuckDuckGo

https://bear.nolt.io/113

Block LLMs from scraping by allowing customization of robots.txt · ʕ•ᴥ•ʔ Bear Feedback

I'm more than happy to have by site indexed by search engines such as Google, Bing, Kagi, etc. But I'd love to have a way to block LLMs such as ChatGPT, from training using data from my site. From what I'm aware, the main viable option for this would be robots.txt. page level no-follow tags would run away search engines since there's no way to specify user-agents. Additionally, DNS level scraping products from companies such as Cloudflare only prevent malicious scrapers (e.g. ones looking for emails/phone numbers).

  • General Meta Tags

    8
    • title
      Block LLMs from scraping by allowing customization of robots.txt · ʕ•ᴥ•ʔ Bear Feedback
    • cache-control
      private, no-cache, no-store, must-revalidate
    • expires
      0
    • pragma
      no-cache
    • charset
      utf-8
  • Open Graph Meta Tags

    5
    • og:description
      I'm more than happy to have by site indexed by search engines such as Google, Bing, Kagi, etc. But I'd love to have a way to block LLMs such as ChatGPT, from training using data from my site. From what I'm aware, the main viable option for this would be robots.txt. page level no-follow tags would run away search engines since there's no way to specify user-agents. Additionally, DNS level scraping products from companies such as Cloudflare only prevent malicious scrapers (e.g. ones looking for emails/phone numbers).
    • og:image
      https://nolt.io/static/dist/images/[email protected]
    • og:title
      Block LLMs from scraping by allowing customization of robots.txt · ʕ•ᴥ•ʔ Bear Feedback
    • og:type
      website
    • og:url
      https://bear.nolt.io/113
  • Twitter Meta Tags

    5
    • twitter:card
      summary
    • twitter:title
      Block LLMs from scraping by allowing customization of robots.txt · ʕ•ᴥ•ʔ Bear Feedback
    • twitter:description
      I'm more than happy to have by site indexed by search engines such as Google, Bing, Kagi, etc. But I'd love to have a way to block LLMs such as ChatGPT, from training using data from my site. From what I'm aware, the main viable option for this would be robots.txt. page level no-follow tags would run away search engines since there's no way to specify user-agents. Additionally, DNS level scraping products from companies such as Cloudflare only prevent malicious scrapers (e.g. ones looking for emails/phone numbers).
    • twitter:image
      https://nolt.io/static/dist/images/[email protected]
    • twitter:site
      @TryNolt
  • Item Prop Meta Tags

    3
    • name
      Block LLMs from scraping by allowing customization of robots.txt · ʕ•ᴥ•ʔ Bear Feedback
    • description
      I'm more than happy to have by site indexed by search engines such as Google, Bing, Kagi, etc. But I'd love to have a way to block LLMs such as ChatGPT, from training using data from my site. From what I'm aware, the main viable option for this would be robots.txt. page level no-follow tags would run away search engines since there's no way to specify user-agents. Additionally, DNS level scraping products from companies such as Cloudflare only prevent malicious scrapers (e.g. ones looking for emails/phone numbers).
    • image
      https://nolt.io/static/dist/images/[email protected]
  • Link Tags

    2
    • canonical
      https://bear.nolt.io/113
    • shortcut icon
      https://nolt.io/static/dist/images/logo.1034f87571.png

Links

6