Actualités

Monde

Actualités Audio

Fews App News List List des news

This researcher has a new way to measure AI performance. It's BS, literally.

BS
Business Insider Fews App News Provider
Fews App Post Time ha 12h

Peter Gostev, AI capability lead at ArenaPeter GostevPeter Gostev's BullshitBench tests AI models with nonsensical questions to spot BS detection. Google Gemini 3. 0 struggles with BullshitBench, failing to reject nonsense over half the time.

Aller á la Source
Nouvelles connexes
Fews App Loading
S'identifier
Facebook Login
Twitter Login
Google Plus Login
Merci pour votre inscription à notre Newsletter
Votre email a déjà été ajouté à notre liste d'abonnés
Email invalide