Conditional Logic, Reduced Overthinking, Exploration #201

BradKML · 2025-08-12T07:19:59Z

BradKML
Aug 12, 2025

Overthinking has been known to be an issue for reasoning models, so some have hacked around it by creating "self-braking" mechanism, which feels like a bit of bodging when the model is trained on a biased form of GRPO https://github.com/ZJU-REAL/Self-Braking-Tuning
And sometimes even "spurious rewards" are enough for LLMs to self-correct, leading to people creating implicit reward systems similar to mental simulation, and passive rewiring similar to daydreaming. https://github.com/ruixin31/Spurious_Rewards https://github.com/sunblaze-ucb/Intuitor https://github.com/letta-ai/sleep-time-compute

There are a few directions for OpenEvolve to take:

Train LLMs to express uncertainty (or indeterminance) more clearly and not penalize them completely (an analogy is how SAT MCQs punish against guessing), OR better yet...
Train LLMs to give situational/conditional answers, and create adaptive benchmarks around how answers match up against ground truth, this would require a lot of work since few benchmarks are this flexible
Training LLMs in agentic environments to stop overthinking, favoring explorative analysis, and asking tougher questions
Discovering effective toolkits that force LLM agents to take more investigative action, rather than to overthink and act slowly
Acceleration and remix of "sleep time" and "intuitor" training methods, such that the target performance requires less compute, or that the same compute yields accelerated performance

This is the counterpart of the whole "free range research" goal, since we need to make sure that it does not become ineffective and waste computing time hammering the same mistakes. Here are some coding agent analogues to these problems: #197

"vibe check" and debugging goals with dependencies/constraints (this is just one MCP among many that can get the job done) https://github.com/PV-Bhat/vibe-check-mcp-server
Loop-breaking Subject: Proposal for 'Meta Debug' Custom Mode to Prevent Cognitive Loops RooCodeInc/Roo-Code#5184 Endless try Loop detection RooCodeInc/Roo-Code#978 (reply in thread)
Tools that implement Socratic dialogue, such that a smaller LLM with a different role can be a control value for the main LLM, spotting commonn issues of overthinking or improper reasoning https://github.com/im-knots/the-academy

Cross-reference tianyi-lab/MiP-Overthinking#2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Conditional Logic, Reduced Overthinking, Exploration #201

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Conditional Logic, Reduced Overthinking, Exploration #201

Uh oh!

Uh oh!

BradKML Aug 12, 2025

Replies: 0 comments

BradKML
Aug 12, 2025