Reinforcement Learning

News

China AI rising: Xiaomi releases new MiMo-7B models as DeepSeek upgrades its Prover math AI

Xiaomi Corp. today released MiMo-7B, a new family of reasoning models that it claims can outperform OpenAI’s o1-mini at some ...

Tech Xplore6h

Reinforcement learning boosts reasoning skills in new diffusion-based language model d1

A team of AI researchers at the University of California, Los Angeles, working with a colleague from Meta AI, has introduced d1, a diffusion-large-language-model-based framework that has been improved ...

Psychology Today6h

Why AI Gets Learning Right and Cognitive Science Doesn’t

When machines fall short, we adjust. When students do, we blame. Here's what that says about learning and instruction.

Beyond autocomplete: Reasoning models raise the bar for generative AI

Many experts believe reasoning models are the future of generative AI because they’re better at handling complexity and less ...

OpenAI rolls back update that made ChatGPT a sycophantic mess

GPT-4o is not a new model—OpenAI released it almost a year ago, but the company occasionally releases revised versions of ...

The Information1dOpinion

XAI Investors on How The Startup Can Win; Is Reinforcement Learning Over?

Before we get into today’s column, I’d like to give a big thank you to all our subscribers who joined us at our “Financing ...

Devdiscourse1d

Redesigning alignment: AI must evolve with empathy to safeguard humanity

Current strategies like reinforcement learning from human feedback (RLHF) and scalable oversight hinge on the assumption that ...

AI Is Using Your Likes to Get Inside Your Head

Liking features on social media can provide troves of data about human behavior to AI models. But as AI gets smarter, will it ...

30 seconds vs. 3: The d1 reasoning framework that’s slashing AI response times

Researchers from UCLA and Meta AI have introduced d1, a novel framework using reinforcement learning (RL) to significantly enhance the reasoning capabilities of diffusion-based large language models ...

Interesting Engineering2d

Video: China's humanoid robot walks like human after mastering smart learning

Adam, a next-gen humanoid robot, uses advanced reinforcement learning to master human-like movement across dynamic terrains ...

Tech Xplore2d

Breaking the spurious link: How causal models fix offline reinforcement learning's generalization problem

Researchers from Nanjing University and Carnegie Mellon University have introduced an AI approach that improves how machines learn from past data—a process known as offline reinforcement learning.

Is ‘The Era of Experience’ Upon Us? Researchers Propose AI Agents Learn From the World

Computer scientist David Silver was a key developer behind AlphaGo, the pivotal Go-playing program that defeated world ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results