Last updated
Last updated
Given a complex tree ensemble such as a Random Forest, they construct a single decision tree of minimum depth that represents the same function over the entire feature space. In other words, you will input a Random Forest which is an ensemble of trees, and get as output a single decision tree that has exact same behaviour over the entire feature space.
This simpler representation will help in greater interpretability without a loss in accuracy. A simple yet powerful model!
Cell [z] : When you split a feature on one/more values, you create hyperplanes. A cell represents the box contained between 2 consecutive hyperplanes for each feature.
Region [(z ,z )] : Region is defined for a feature space. It is a collection of cells. The span of the region is represented by the bottom-left and top-right cell. The region contains all the cells including and between those 2 cells.
The authors use a Dynamic Programming approach to construct an optimal single tree representation of the ensemble. The idea is that we can use the property of optimal substructure and claim that we can limit the search to those optimal born-again trees who left and right sub-trees also represent optimal born-again trees for their regions respectively.
This allows us to frame the search as a recursion like so-
In the DP equation, we see that to evaluate the base case, we need to see the homogeneity of the class over the region every time. We can avoid this by integrating the check in the main recursion and using memoization to evaluate it just once and reuse it when needed.
In the DP equation, we search over all possible hyperplanes for each feature. Turns out, we do not need to do this linear search. We can discard a number of those candidates and do a binary search instead in order to find the best hyperplane for a feature. The authors give a detailed proof for the same.
1. Post pruning
Since the born again decision tree is trying to have the exact same behaviour as the random forest on the entire feature space, it leads to some conditions that are either not possible in real life or do not have any training data satisfying that condition. This primarily happens because the trees in the random forest are independent and one tree that replicates all joint conditions might lead to some improbable cases. So, once the born again tree is made, pruning is done from the bottom up. The nodes that have no training point falling into them are removed and replaced by the child node of the branch that has some samples.
However, after this step we can no longer claim that the born again decision tree replicates the random forest over the entire feature space. But the good news is that this has very little effect on the performance (0.0001% accuracy delta on test data) and also leads to much more interpretable tree with lesser depth and lesser number of leaves.
2. Heuristic
Early on in the paper, the authors prove that this problem of finding the born again tree is NP-Hard by reducing it to 3 SAT problem. The computation time of finding the born again tree is not practical when features increase beyond 20. To deal with this, they suggest a heuristic wherein the born again tree would still be completely faithful to the Random Forest in the entire feature space but would no longer be of least depth. The heuristic is as follows-
This heuristic reduces the computation time significantly for large datasets (1381 seconds -> 1.14 seconds ) with a 20-22% increase in number of leaves and depth which is a fair trade off.
I really liked the way they presented the paper by first starting out with theoretical proofs and the optimal algorithm but at the same time acknowledging that this won't be that useful for practical use cases and hence going on the journey of adding tricks and heuristics to make it practically applicable. With sound proofs, detailed experimental results and nifty tricks, it was a really engaging and interesting read!
The ID(z ,z ) condition is True when that region gives the same majority class in all its enclosing cells
a. For a region ( ), instead of going over each cell and starting recursion for each, check for randomly picked Nc=1000 cells.
b.If all those Nc cells have same class, use Integer Programming to confirm if the entire region ( ) is homogenous or not. If we find out its homogenous, then we call it a leaf and end the recursion. Else, we continue splitting till we find a homogenous region.
by Thibaut Vidal and Maximilian Schiffer
Week 2.