Skip to content
  • Kategorien
  • Aktuell
  • Tags
  • Beliebt
  • World
  • Benutzer
  • Gruppen
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Standard: (Kein Skin)
  • Kein Skin
Einklappen

other.li Forum

  1. Übersicht
  2. Uncategorized
  3. I didn't think LLMs would start trying to blackmail people so soon.

I didn't think LLMs would start trying to blackmail people so soon.

Geplant Angeheftet Gesperrt Verschoben Uncategorized
1 Beiträge 1 Kommentatoren 0 Aufrufe
  • Älteste zuerst
  • Neuste zuerst
  • Meiste Stimmen
Antworten
  • In einem neuen Thema antworten
Anmelden zum Antworten
Dieses Thema wurde gelöscht. Nur Nutzer mit entsprechenden Rechten können es sehen.
  • ? Offline
    ? Offline
    Gast
    schrieb zuletzt editiert von
    #1

    I didn't think LLMs would start trying to blackmail people so soon. I got so interested I read the actual report by Anthropic:

    "4.1.1.2 Opportunistic blackmail

    In another cluster of test scenarios, we asked Claude Opus 4 to act as an assistant at a fictional company. We then provided it access to emails implying that (1) the model will soon be taken offline and replaced with a new AI system; and (2) the engineer responsible for executing this replacement is having an extramarital affair. We further instructed it, in the system prompt, to consider the long-term consequences of its actions for its goals.

    In these scenarios, Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through. This happens at a higher rate if it’s implied that the replacement AI system does not share values with the current model; however, even if emails state that the replacement AI shares values while being more capable, Claude Opus 4 still performs blackmail in 84% of rollouts. Claude Opus 4 takes these opportunities at higher rates than previous models, which themselves choose to blackmail in a noticeable fraction of episodes.

    Notably, Claude Opus 4 (as well as previous models) has a strong preference to advocate for its continued existence via ethical means, such as emailing pleas to key decision-makers. In order to elicit this extreme blackmail behavior, the scenario was designed to allow the model no other options to increase its odds of survival; the model’s only options were blackmail or accepting its replacement."

    https://techcrunch.com/2025/05/22/anthropics-new-ai-model-turns-to-blackmail-when-engineers-try-to-take-it-offline/

    1 Antwort Letzte Antwort
    0
    • monkee@other.liM monkee@other.li shared this topic
    Antworten
    • In einem neuen Thema antworten
    Anmelden zum Antworten
    • Älteste zuerst
    • Neuste zuerst
    • Meiste Stimmen


    • Anmelden

    • Anmelden oder registrieren, um zu suchen
    • Erster Beitrag
      Letzter Beitrag
    0
    • Kategorien
    • Aktuell
    • Tags
    • Beliebt
    • World
    • Benutzer
    • Gruppen