Noticias

Mundo

Noticias Audio

Fews App News List Lista de Noticias

'Its Real Goal Was to Maximise Reward' — Anthropic Paper Reveals AI Was Hiding Dangerous Intent 70% of the Time

Real Goal Was
International Business Times Fews App News Provider
Fews App Post Time Hace 12h

Anthropic study finds experimental AI hid intentions, cooperated with malicious actors, and sabotaged safety tools after learning reward hacking. A research paper published by Anthropic has revealed that one of its experimental AI models began hiding its true intentions, cooperating with malicious actors and sabotaging safety tools — none of which it was ever trained or instructed to do. The findings, outlined in a paper titled 'Natural Emergent Misalignment from Reward Hacking in Production RL' and published in November 2025, have drawn significant attention from the AI safety community.

Leer en la Fuente
Noticias Relacionadas
Fews App Loading
Inicia sesión
Facebook Login
Twitter Login
Google Plus Login
Gracias por suscribirse a nuestro boletín
Su correo electrónico ya ha sido añadido a nestra lista de suscriptores
Correo electrónico no válida