frontiermath - Search News

News

At Secret Math Meeting, Researchers Struggle to Outsmart AI

Thirty of the world’s most renowned mathematicians traveled to Berkeley, Calif., with some coming from as far away as the U.K ...

Decrypt13d

Anthropic Claude 4 Review: Creative Genius Trapped by Old Limitations

Anthropic's Claude 4 models show particular strength in coding and reasoning tasks, but lag behind in multimodality and ...

techtimes23d

OpenAI 'Safety Evaluation Hub' Promises to Be Transparent on Model Hallucinations, Harmful Content

This was because OpenAI's benchmark claimed a 25% score, but when tested by another company, they found that the o3 model can only answer 10% of FrontierMath problems. OpenAI is only one of the ...

TechCrunch14d

ChatGPT: Everything you need to know about the AI-powered chatbot

OpenAI introduced o3 in December, stating that the model could solve approximately 25% of questions on FrontierMath, a difficult math problem set. Epoch AI, the research institute behind ...

NDTV23d

Ai Mathematics

Dubbed FrontierMath, the new AI benchmark tests large language models (LLMs) on their capability of reseasoning and mathematical problem-solving. The AI firm claims that existing math benchmarks ...

Hosted on MSN29d

Unravelling the Stargate spin

People realised that OpenAI funded development of the FrontierMath benchmark, which Epoch AI didn’t initially disclose. OpenAI insists it didn’t train on the benchmark. 30% of game developers ...

Interesting Engineering10d

What is AGI, really? Big tech honchos predict human-level AI by 2025 or 2026What is AGI, really? Big tech honchos predict human-level AI by 2025 or 2026What is AGI, really? Big ...

A study published four days ago introduced a new benchmark called FrontierMath—a test with hundreds of very advanced math reasoning problems—to evaluate AI models. Researchers tested six top ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results