Algebraic Model Counting for Global Analysis of Optimal Decision Trees
This paper introduces Algebraic Decision Tree Counting (ADTC), a formal framework designed to perform a comprehensive, global assessment of the entire hypothesis space of decision trees. Rather than searching for a single "best" model, which can lead to issues with model reliability and predictive multiplicity, ADTC evaluates all optimal and near-optimal trees simultaneously. By reformulating analytical tasks—such as counting, optimization, and sampling—into a unified algebraic computation, the framework allows researchers to navigate the "Rashomon set," a collection of models that perform similarly but offer different explanations.
A Unified Algebraic Approach
ADTC is inspired by Algebraic Model Counting (AMC), a technique used in knowledge representation to aggregate weights of satisfying assignments. The framework treats decision trees as structures that can be evaluated over a commutative semiring. By defining aggregation functions and constraint formulas, ADTC can solve diverse problems within a single system. For instance, by changing the underlying semiring, the same framework can be used to count the number of accurate trees, find the tree that minimizes error, or determine if a model meeting specific fairness and accuracy criteria exists.
Efficient Computation via Tensors
A major challenge in analyzing decision trees is that the number of possible trees grows at a double-exponential rate relative to the maximum tree depth. ADTC overcomes this bottleneck by using a dynamic programming algorithm that employs "model behavior tensors." These tensors aggregate information about tree metrics—such as size, error, and fairness—using convolution products. This approach reduces the computational complexity to a level that is polynomial in the number of features, allowing the framework to avoid the exhaustive enumeration of every possible tree while still providing an exact profile of the model landscape.
Global Navigation and Model Selection
The framework provides a transparent way to visualize trade-offs between competing objectives. By constructing a "model profile," users can examine the relationship between metrics like accuracy, F1-score, and fairness gaps. This is particularly useful in sensitive domains where stakeholders must balance predictive performance with interpretability or ethical constraints. The authors demonstrate the utility of this approach through their software, emtrees, which was tested on real-world datasets to show how ADTC facilitates evidence-based model selection by revealing the diversity of high-quality models available for a given task.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!