Noticias

Mundo

Noticias Audio

Fews App News List Lista de Noticias

This researcher has a new way to measure AI performance. It's BS, literally.

BS
Business Insider Fews App News Provider
Fews App Post Time Hace 12h

Peter Gostev, AI capability lead at ArenaPeter GostevPeter Gostev's BullshitBench tests AI models with nonsensical questions to spot BS detection. Google Gemini 3. 0 struggles with BullshitBench, failing to reject nonsense over half the time.

Leer en la Fuente
Noticias Relacionadas
Fews App Loading
Inicia sesión
Facebook Login
Twitter Login
Google Plus Login
Gracias por suscribirse a nuestro boletín
Su correo electrónico ya ha sido añadido a nestra lista de suscriptores
Correo electrónico no válida