Model tree analysis with randomly generated and evolved trees (M-TARGET)
Tree structured modeling is a data mining technique used to recursively partition a data set into relatively homogeneous subgroups in order to make more accurate predictions on future observations. One of the earliest decision tree induction algorithms, CART (Classification and Regression Trees) (Breiman, Friedman, Olshen, and Stone 1984), had problems including greediness, split selection bias, and simplistic formation of classification and prediction rules in the terminal leaf nodes. Improvements are proposed in other algorithms including Bayesian CART (Chipman, George, and McCulloch 1998), Bayesian Treed Regression (Chipman, George, and McCulloch 2002), TARGET (Tree Analysis with Randomly Generated and Evolved Trees) (Fan and Gray 2005; Gray and Fan 2008), and Treed Regression (Alexander and Grimshaw 2006). TARGET, Bayesian CART, and Bayesian Treed Regression introduced stochastically driven search methods that explore the tree space in a non-greedy fashion. These methods enable the tree space to be searched with global optimality in mind, rather than following a series of locally optimal splits. Treed Regression and Bayesian Treed Regression feature the addition of models in the leaf nodes to predict and classify new observations instead of using the mean or weighted majority vote as in traditional regression and classification trees, respectively. This dissertation proposes a new method called M-TARGET (Model Tree Analysis with Randomly Evolved and Generated Trees) which combines the stochastic nature of TARGET with the enhancement of models in the leaf nodes to improve prediction and classification accuracy. Comparisons with Treed Regression and Bayesian Treed Regression using real data sets show favorable results with regard to RMSE and tree size, which suggests that M-TARGET is a viable approach to decision tree modeling.