News

World

Audio News

Fews App News List News List

This researcher has a new way to measure AI performance. It's BS, literally.

BS
Business Insider Fews App News Provider
Fews App Post Time 10h ago

Peter Gostev, AI capability lead at ArenaPeter GostevPeter Gostev's BullshitBench tests AI models with nonsensical questions to spot BS detection. Google Gemini 3. 0 struggles with BullshitBench, failing to reject nonsense over half the time.

Go to Source
Related News
Fews App Loading
Login
Facebook Login
Twitter Login
Google Plus Login
Thank you for subscribing our newsletter
Your email has already been added to our subscibers list
Invalid email