frontiermath - Search News

News

AI’s math problem: FrontierMath benchmark shows how far technology still has to go

A groundbreaking new benchmark, FrontierMath, is exposing just how far today’s AI is from mastering the complexities of higher mathematics. Developed by the research group Epoch AI, FrontierMath ...

TechRepublic1mon

OpenAI’s o3: AI Benchmark Discrepancy Reveals Gaps in Performance Claims

OpenAI’s o3: AI Benchmark Discrepancy Reveals Gaps in Performance Claims Your email has been sent The FrontierMath benchmark from Epoch AI tests generative models on difficult math problems.

eWeek6mon

FrontierMath Benchmark Exposes AI Struggles in Advanced Math

A trailblazing new benchmark from research firm Epoch AI called FrontierMath found that even today’s most advanced AI systems, including GPT-4o and Gemini 1.5 Pro, solved less than 2 percent of ...

Searchenginejournal.com4mon

OpenAI Secretly Funded Benchmarking Dataset Linked To o3 Model

Revelations that OpenAI secretly funded and had access to the FrontierMath benchmarking dataset are raising concerns about whether it was used to train its reasoning o3 AI reasoning model ...

TechCrunch4mon

AI benchmarking organization criticized for waiting to disclose funding from OpenAI

Epoch AI, a nonprofit primarily funded by Open Philanthropy, a research and grantmaking foundation, revealed on December 20 that OpenAI had supported the creation of FrontierMath. FrontierMath ...

Ars Technica6mon

New secret math benchmark stumps AI models and PhDs alike

On Friday, research organization Epoch AI released FrontierMath, a new mathematics benchmark that has been turning heads in the AI world because it contains hundreds of expert-level problems that ...

Hosted on MSN4mon

"We made a mistake in not being more transparent": OpenAI secretly accessed benchmark data, raising questions about the AI model's supposedly "high scores" — after Sam Altman ...

In the next couple of weeks, OpenAI is slated to unveil o3 mini, the smaller version of its o3 series with advanced reasoning capabilities across math, science, and coding. CEO Sam Altman claims ...

Hosted on MSN1mon

OpenAI’s o3 AI model scores lower on a benchmark than the company initially implied

When OpenAI unveiled o3 in December, the company claimed the model could answer just over a fourth of questions on FrontierMath, a challenging set of math problems. That score blew the competition ...

Time5mon

AI Models Are Getting Smarter. New Tests Are Racing to Catch Up

In November, the nonprofit research institute Epoch AI announced a set of exceptionally challenging math questions developed in collaboration with leading mathematicians, called FrontierMath ...

GIGAZINE5mon

Mathematicians talk about the shock of OpenAI's o3 model scoring 25.2% on the ultra-difficult math dataset 'FrontierMath'

Kevin Buzzard, a mathematician and professor of pure mathematics at Imperial College London, posted a blog post explaining how OpenAI's o3 model scored 25.2% on the FrontierMath problem dataset.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results