Human Benchmark Record

With AI models clobbering every benchmark, it's time for human evaluation

Artificial intelligence has traditionally advanced through automatic accuracy tests in tasks meant to approximate human knowledge. Carefully crafted benchmark tests such as The General Language ...

Morningstar

Sup AI Sets New Benchmark Record with 52.15% on Humanity's Last Exam

Important Disclosure: This is an independent evaluation conducted by Sup AI and is not officially endorsed, validated, or recognized by the Center for AI Safety, Scale AI, or the HLE benchmark ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

With AI models clobbering every benchmark, it's time for human evaluation

Sup AI Sets New Benchmark Record with 52.15% on Humanity's Last Exam

Trending now