frontiermath - Search News

News

At Secret Math Meeting, Researchers Struggle to Outsmart AI

Thirty of the world’s most renowned mathematicians traveled to Berkeley, Calif., with some coming from as far away as the U.K ...

20h

Inside the Secret Meeting Where Mathematicians Struggled to Outsmart AI

Thirty of the world’s most renowned mathematicians traveled to Berkeley, Calif., with some coming from as far away as the U.K ...

The Information3d

Stanford Math PhD’s AI Startup Targets $300 Million Valuation

Mathematicians have been enthralled with artificial intelligence that can solve difficult math problems. And some developers ...

Decrypt13d

Anthropic Claude 4 Review: Creative Genius Trapped by Old Limitations

Anthropic's Claude 4 models show particular strength in coding and reasoning tasks, but lag behind in multimodality and ...

techtimes23d

OpenAI 'Safety Evaluation Hub' Promises to Be Transparent on Model Hallucinations, Harmful Content

This was because OpenAI's benchmark claimed a 25% score, but when tested by another company, they found that the o3 model can only answer 10% of FrontierMath problems. OpenAI is only one of the ...

Geeky Gadgets1mon

AI Benchmarks Are Broken : The Leaderboard Illusion

The challenges faced by LM Arena are not isolated. Other benchmarks, such as Frontier Math and ARC AGI, have also been criticized for similar shortcomings. These issues highlight systemic problems ...

BusinessGhana1mon

OpenAI’s o3 AI model scores lower on a benchmark than the company initially implied

When OpenAI unveiled o3 in December, the company claimed the model could answer just over a fourth of questions on FrontierMath, a challenging set of math problems. That score blew the competition ...

TechRepublic1mon

OpenAI’s o3: AI Benchmark Discrepancy Reveals Gaps in Performance Claims

OpenAI’s o3: AI Benchmark Discrepancy Reveals Gaps in Performance Claims Your email has been sent The FrontierMath benchmark from Epoch AI tests generative models on difficult math problems.

thetechportal.com1mon

Third-party tests show OpenAI’s o3 under-delivers

At the time of its introduction, o3 was stated to come with the ability to solve more than 25% of questions on FrontierMath, a dataset designed to test complex mathematical reasoning. This ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results