Notícias

Mundo

Notícias Áudio

Fews App News List Lista de Notícias

This researcher has a new way to measure AI performance. It's BS, literally.

BS
Business Insider Fews App News Provider
Fews App Post Time há 12h

Peter Gostev, AI capability lead at ArenaPeter GostevPeter Gostev's BullshitBench tests AI models with nonsensical questions to spot BS detection. Google Gemini 3. 0 struggles with BullshitBench, failing to reject nonsense over half the time.

Ler na Fonte
Notícias Relacionadas
Fews App Loading
Iniciar sessão
Facebook Login
Twitter Login
Google Plus Login
Agradecemos a sua subscrição da nossa Newsletter
O seu email já se encontra adicionado à lista
Email inválido