A Greedy Search Tree Heuristic for Symbolic Regression

Symbolic Regression tries to find a mathematical expression that describes the relationship of a set of explanatory variables to a measured variable. The main objective is to find a model that minimizes the error and, optionally, that also minimizes the expression size. A smaller expression can be seen as an interpretable model considered a reliable decision model. This is often performed with Genetic Programming which represents their solution as expression trees. The shortcoming of this algorithm lies on this representation that defines a rugged search space and contains expressions of any size and difficulty. These pose as a challenge to find the optimal solution under computational constraints. This paper introduces a new data structure, called Interaction-Transformation (IT), that constrains the search space in order to exclude a region of larger and more complicated expressions. In order to test this data structure, it was also introduced an heuristic called SymTree. The obtained results show evidence that SymTree are capable of obtaining the optimal solution whenever the target function is within the search space of the IT data structure and competitive results when it is not. Overall, the algorithm found a good compromise between accuracy and simplicity for all the generated models.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset