Evolution vs. Parallel: Is the evolution process really necessary? #177
Replies: 4 comments
-
Thanks for trying this experiment, it is very interesting. I think this "parallel" implementation still provides the LLM with:
This is essentially evolution with population=1 and a fixed parent not a fully parallel generation. A more appropriate parallel baseline would:
The similar performance makes sense because we are comparing full evolution vs. minimal evolution with a good starting point. The LLM is powerful enough to improve when given a working example + metrics, which explains the 0.9927 vs 0.9924 scores. It would be interesting to try with other examples like the one in https://github.com/codelion/openevolve/tree/main/examples/mlx_metal_kernel_opt or https://github.com/codelion/openevolve/tree/main/examples/signal_processing to see if this approach always leads to the same best_program.py but converges much faster thant the islands model. Btw, I originally didn't have an islands model when I did the circle packing example, it was added only recently after this bug was reported - #40 so the original replication actually didn't use the islands model as well. I believe what this experiment demonstrates is that even minimal evolutionary pressure can be effective, not that evolution is unnecessary. |
Beta Was this translation helpful? Give feedback.
-
Something to consideer: instead of just "best of N" could you show median + 1SD ranges? So that we can compare their performance on how they handle worse cases? |
Beta Was this translation helpful? Give feedback.
-
Perhaps you can try more difficult problems, or use a weaker LLM, at least you shouldn't let the score reach the upper limit |
Beta Was this translation helpful? Give feedback.
-
@yangnianboy ideally it should have a bigger basket of problems so things can smooth out |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
To verify whether the evolving process helps, I compared it with a baseline parallel ablation that only includes the initial program as a prompt for each iteration. I selected the best-of-N as the best program and verified this on the circle_packing problem. Following a two-stage setting, I used the initial program in stage 1, and in stage 2, I used the best program from checkpoint 100 as the initial program. The results show that even without the evaluation process, history, and evolution tree, the system performs well.
Best Evolve score: 0.9927
Best Parallel score: 0.9924
This ablation study raises doubts about the necessity of incorporating the evolutionary process.
The parallel template is:
This is code for parallel run:
Beta Was this translation helpful? Give feedback.
All reactions