frontiermath - Search News

News

Stanford Math PhD’s AI Startup Targets $300 Million Valuation

Mathematicians have been enthralled with artificial intelligence that can solve difficult math problems. And some developers ...

Decrypt11d

Anthropic Claude 4 Review: Creative Genius Trapped by Old Limitations

Anthropic's Claude 4 models show particular strength in coding and reasoning tasks, but lag behind in multimodality and ...

techtimes20d

OpenAI 'Safety Evaluation Hub' Promises to Be Transparent on Model Hallucinations, Harmful Content

This was because OpenAI's benchmark claimed a 25% score, but when tested by another company, they found that the o3 model can only answer 10% of FrontierMath problems. OpenAI is only one of the ...

MIT Technology Review28d

How to build a better AI benchmark

Making the situation worse, several benchmarks, most notably FrontierMath and Chatbot Arena, have recently come under heat for an alleged lack of transparency. Nevertheless, benchmarks still play ...

Geeky Gadgets1mon

AI Benchmarks Are Broken : The Leaderboard Illusion

The challenges faced by LM Arena are not isolated. Other benchmarks, such as Frontier Math and ARC AGI, have also been criticized for similar shortcomings. These issues highlight systemic problems ...

BusinessGhana1mon

OpenAI’s o3 AI model scores lower on a benchmark than the company initially implied

When OpenAI unveiled o3 in December, the company claimed the model could answer just over a fourth of questions on FrontierMath, a challenging set of math problems. That score blew the competition ...

TechRepublic1mon

OpenAI’s o3: AI Benchmark Discrepancy Reveals Gaps in Performance Claims

OpenAI’s o3: AI Benchmark Discrepancy Reveals Gaps in Performance Claims Your email has been sent The FrontierMath benchmark from Epoch AI tests generative models on difficult math problems.

thetechportal.com1mon

Third-party tests show OpenAI’s o3 under-delivers

At the time of its introduction, o3 was stated to come with the ability to solve more than 25% of questions on FrontierMath, a dataset designed to test complex mathematical reasoning. This ...

techtimes1mon

OpenAI o3 Model: Lower Benchmark Scores Raise Questions About Claims, Transparency Over AI

The company made significant claims about the capabilities of its o3 model, which it company unveiled last year, including its power to solve more complex math problems from FrontierMath and more.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results