Notícias

Mundo

Notícias Áudio

Fews App News List Lista de Notícias

This researcher has a new way to measure AI performance. It's BS, literally.

BS
Business Insider Fews App News Provider
Fews App Post Time há 12h

Peter Gostev, AI capability lead at ArenaPeter GostevPeter Gostev's BullshitBench tests AI models with nonsensical questions to spot BS detection. Google Gemini 3. 0 struggles with BullshitBench, failing to reject nonsense over half the time.

Ler na Fonte
Notícias Relacionadas
Fews App Loading
Iniciar sessão
Facebook Login
Twitter Login
Google Plus Login
Obrigado por subscrever a nossa newsletter
O seu email já está adicionado à nossa lista
Email inválido