Symbolic Regression and Feature Engineering #206

BradKML · 2025-08-13T02:46:45Z

BradKML
Aug 13, 2025

There is something funny that itched in my head when it comes to "feature engineering": realizing that a lot of the times it feels like banging rocks together if Exploratory Data Analysis (EDA) is too hard or annoying to deal with, AutoML seemed to be a simple solution, but it usually takes too long. Here are some examples:

Microsoft phased out their open AutoML tooling https://github.com/microsoft/nni
Alteryx has a good suite (same people behind EvalML) https://github.com/alteryx/featuretools
https://github.com/mljar/mljar-supervised
Standalone libraries exist https://github.com/feature-engine/feature_engine
NVidia tried to beat Microsoft https://github.com/NVIDIA-Merlin/NVTabular https://github.com/NVIDIA-Merlin/Merlin
Google too https://github.com/google/temporian
Another toolset https://github.com/AutoViML/featurewiz

The field of Symbolic Regression seems to help this task. Originally, it was created for generating better equations, instead of using neural networks for predictions. Elegance is key, but it can also be used for tabular data.
Some of the libraries include (assuming we exclude quants):

Python and Julia https://github.com/MilesCranmer/PySR
People still uses Fortran (!?) https://github.com/rouyang2017/SISSO
Legacy libraries for reference https://github.com/Ambrosys/glyph
Scikit layer https://github.com/heal-research/pyoperon
AutoML (thankfully) https://github.com/hftsoi/symbolfit
https://github.com/MaxHalford/xgp

There are like 3 different approaches to the solution (and paobably all of them need to get benchmarked by SRBench https://github.com/cavalab/srbench):

LLM-based solutions (most popular now, least explored)

Genetic programming (most explored, more popular in the past)

"Neuro-symbolic" (least popular, not as explored) and neural networks

The goal of OpenEvolve in this case is

Accelerate Symbolic Regression optimizers based on SRBench, potential future benchmarks, as well as whatever the user wants to throw at it

Gather tabular dataset from OpenML and see if Symbolic Regression can be turned into a feature engineering tool that enhance model prediction, while minimizing computational complexity and time cost (assuming that AutoML is also an option)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Symbolic Regression and Feature Engineering #206

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Symbolic Regression and Feature Engineering #206

Uh oh!

Uh oh!

BradKML Aug 13, 2025

Replies: 0 comments

BradKML
Aug 13, 2025