Skip to content
  • Kategorien
  • Aktuell
  • Tags
  • Beliebt
  • World
  • Benutzer
  • Gruppen
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Standard: (Kein Skin)
  • Kein Skin
Einklappen

other.li Forum

  1. Übersicht
  2. Uncategorized
  3. RE: https://tldr.nettime.org/@tante/116605858023186072

RE: https://tldr.nettime.org/@tante/116605858023186072

Geplant Angeheftet Gesperrt Verschoben Uncategorized
175 Beiträge 120 Kommentatoren 20 Aufrufe
  • Älteste zuerst
  • Neuste zuerst
  • Meiste Stimmen
Antworten
  • In einem neuen Thema antworten
Anmelden zum Antworten
Dieses Thema wurde gelöscht. Nur Nutzer mit entsprechenden Rechten können es sehen.
  • ? Gast

    Going with meta noindex for now. My thinking is that this actively tells Google to yank already-crawled content from their index, whereas they might take a robots.txt entry to mean “do not update, but keep showing last fetched.”

    ? Offline
    ? Offline
    Gast
    schrieb zuletzt editiert von
    #161

    @inthehands I found a google note on meta tag so saying that if you use robots.txt to exclude googlebot, it won’t be able to crawl your pages to honor noindex, so if another site google does index points to a page on your site, the link will still be indexed.

    https://developers.google.com/search/docs/crawling-indexing/block-indexing

    1 Antwort Letzte Antwort
    0
    • ? Gast

      Defeatism is a form of surrender. Cynicism is surrender. Despair is surrender. Nihilism is surrender.

      Our job is to •care• and to •keep caring• and to •keep doing and keep building• and to •endure• longer than them.

      ? Offline
      ? Offline
      Gast
      schrieb zuletzt editiert von
      #162

      @inthehands annoy people into doing better.

      1 Antwort Letzte Antwort
      0
      • ? Gast

        @inthehands As I said just a while ago: Every big tech press event these last few years have felt like "Announcing our exciting plans for oligarchs to strip-mine the entire world and immiserate all of humanity! Get on board, and also death to the unbelievers!"

        dogiedog64@app.wafrn.netD This user is from outside of this forum
        dogiedog64@app.wafrn.netD This user is from outside of this forum
        dogiedog64@app.wafrn.net
        schrieb zuletzt editiert von dogiedog64@app.wafrn.net
        #163

        @inthehands@hachyderm.io @datarama@hachyderm.io

        Here's how I've seen the response to Google's latest bullshit:

        1 Antwort Letzte Antwort
        0
        • ? Gast

          @glassresistor @inthehands are you doing this in a particular way? Basically looking for different approaches.

          ? Offline
          ? Offline
          Gast
          schrieb zuletzt editiert von
          #164

          @rooneymcnibnug @inthehands filtering by user agent, ip address, cloudfront no bots acl config, by load

          ? 1 Antwort Letzte Antwort
          0
          • ? Gast

            @rooneymcnibnug @inthehands filtering by user agent, ip address, cloudfront no bots acl config, by load

            ? Offline
            ? Offline
            Gast
            schrieb zuletzt editiert von
            #165

            @glassresistor @inthehands oh okay I was misinterpreting the statement, my bad

            1 Antwort Letzte Antwort
            0
            • ? Gast

              @inthehands
              There is a new fad called "data poisoning" that web sites are using to foil ai scraping. One music site put a Homer Simpson monologue into every track in its online data base. It starts a few seconds in and continues to the end. That's only one way it's being used. We need a generation of ai "monkey wrench gangs " to start sabotaging. It's really no different than what Edward Abbey talked about, instead of extractive earth raping machinery being targeted , it's data mining machinery.

              ? Offline
              ? Offline
              Gast
              schrieb zuletzt editiert von
              #166

              @Coho @inthehands

              How is the data laundering machine not *also* raping earth? Bad things aren't mutually exclusive

              1 Antwort Letzte Antwort
              0
              • ? Gast

                Quick strategy discussion, for those who understand Google indexing and SEO:

                If I want to yank a web site out of Google’s now-fully-extractive search, should I (1) disallow googlebot in robots.txt or (2) add `<meta name="googlebot" content="noindex">` to all the page headers?

                The goal here is not just to remove my contributions to the commons from Google’s results, but to •make Google aware• that sites are pulling consent. What will best do that?

                2/2

                ? Offline
                ? Offline
                Gast
                schrieb zuletzt editiert von
                #167

                @inthehands hi! Did anyone answer to your question here? I'm very interested in blocking the bots, but everyone seems so eager to give their fucking opinions instead of answering this, that I couldn't find a proper way of blocking said bots. Thanks!

                1 Antwort Letzte Antwort
                0
                • ? Gast

                  @gherhartd @inthehands Use Brave browser and Brave search, or equivalent, might be one way

                  ? Offline
                  ? Offline
                  Gast
                  schrieb zuletzt editiert von
                  #168

                  @commons_protocol @inthehands the problem isn't our own usage, the problem is the hegemonic place of Chrome. Forbidding Chrome browser from accessing one's site is almost completely cutting it from the web, from nearly everybody.

                  1 Antwort Letzte Antwort
                  0
                  • ? Gast

                    @ShadSterling @inthehands i don't know if there's a coordinated movement. there are prefab tools like https://lib.rs/crates/iocaine that are relatively easy to deploy, though i imagine they also lose some of their effectiveness as they become more popular and LLM providers start to counter them

                    ? Offline
                    ? Offline
                    Gast
                    schrieb zuletzt editiert von
                    #169

                    @joe @ShadSterling @inthehands there is one in Java?

                    1 Antwort Letzte Antwort
                    0
                    • ? Gast

                      Quick strategy discussion, for those who understand Google indexing and SEO:

                      If I want to yank a web site out of Google’s now-fully-extractive search, should I (1) disallow googlebot in robots.txt or (2) add `<meta name="googlebot" content="noindex">` to all the page headers?

                      The goal here is not just to remove my contributions to the commons from Google’s results, but to •make Google aware• that sites are pulling consent. What will best do that?

                      2/2

                      ? Offline
                      ? Offline
                      Gast
                      schrieb zuletzt editiert von
                      #170

                      @inthehands all the bots that are used to train AI, don't care about robots.txt, or meta stuff. Otherwise, it will be much more harder to train stuff.

                      1 Antwort Letzte Antwort
                      0
                      • ? Gast

                        OK, a •lot• of replies need this reponse:

                        Yes, of •course• they will start ignoring robots.txt etc as soon as they think it hurts their business. Of course.

                        It is important to •force that fight•, rather than just capitulating in advance.

                        ? Offline
                        ? Offline
                        Gast
                        schrieb zuletzt editiert von
                        #171

                        @inthehands

                        I’m mostly worried about their “agentic” part, because that sounds like new infrastructure with possibly different user agents etc., so harder to ban, and I’m 💯 sure it will DEFINITELY have no “social contract” whatsoever.

                        1 Antwort Letzte Antwort
                        0
                        • ? Gast

                          RE: https://tldr.nettime.org/@tante/116605858023186072

                          Google Search rests on a social contract: their bots can crawl our sites, they can index our sites, and they can show excerpts of our sites because

                          and •only because•

                          they send people to our sites. •Our• sites, our words, with our design, with our links, with our context and our aesthetics, shared the way we want to share them.

                          Google is announcing — unambiguously and with great fanfare — that they are now fully breaking that already-ragged contract. We should reciprocate.

                          1/2

                          ? Offline
                          ? Offline
                          Gast
                          schrieb zuletzt editiert von
                          #172

                          @inthehands
                          Some websites are being deindexed but they're still being crawled by Google
                          https://social.wikidex.net/@ciencia/116611028083997779

                          1 Antwort Letzte Antwort
                          0
                          • ? Gast

                            @korrupt @inthehands is “Meta noindex” Like “swiper, no swiping”?

                            ? Offline
                            ? Offline
                            Gast
                            schrieb zuletzt editiert von
                            #173

                            @amyworrall @inthehands I don't get the "Swiper, no swiping"..?
                            The effect is "Hey, here you see a page which must not be indexed in google search" -> google sees it, reads it and won't index it, regardless of other signals (links, sitemap etc.). robots.txt forbids access to the page. -> Google knows nothing about page content, but may index the URL, if its found elsewhere (external link, sitemap, whatever). My experience: with no further information -> You get a very ugly SERP snippet 🙂

                            1 Antwort Letzte Antwort
                            0
                            • ? Gast

                              @inthehands
                              There is a new fad called "data poisoning" that web sites are using to foil ai scraping. One music site put a Homer Simpson monologue into every track in its online data base. It starts a few seconds in and continues to the end. That's only one way it's being used. We need a generation of ai "monkey wrench gangs " to start sabotaging. It's really no different than what Edward Abbey talked about, instead of extractive earth raping machinery being targeted , it's data mining machinery.

                              ? Offline
                              ? Offline
                              Gast
                              schrieb zuletzt editiert von
                              #174

                              @Coho @inthehands So the music site exists not for humans, but solely to fuck up AI scrapers?

                              1 Antwort Letzte Antwort
                              0
                              • monkee@chaos.socialM monkee@chaos.social shared this topic
                              Antworten
                              • In einem neuen Thema antworten
                              Anmelden zum Antworten
                              • Älteste zuerst
                              • Neuste zuerst
                              • Meiste Stimmen


                              • Anmelden

                              • Anmelden oder registrieren, um zu suchen
                              • Erster Beitrag
                                Letzter Beitrag
                              0
                              • Kategorien
                              • Aktuell
                              • Tags
                              • Beliebt
                              • World
                              • Benutzer
                              • Gruppen