Back to AI Research

AI Research

Quantifying Sensitivity for Tree Ensembles: A symbo... | AI Research

Key Takeaways

  • Decision Tree Ensembles (DTEs) are widely used in critical fields like finance to make predictions based on tabular data.
  • One such verification question is the problem of sensitivity, which asks, given a DTE, whether a small change in subset of features can lead to misclassification of the input.
  • In this work, our focus is to build a quantitative notion of sensitivity, tailored to DTEs, by discretizing the input space of the model and enumerating the regions which are susceptible to sensitivity.
  • We propose a novel algorithmic technique that can perform this computation efficiently, within a certified error and confidence bound.
  • Our approach is based on encoding the problem as an algebraic decision diagram (ADD), and further splitting it into subproblems that can be solved efficiently and make the computation compositional and scalable.
Paper AbstractExpand

Decision tree ensembles (DTE) are a popular model for a wide range of AI classification tasks, used in multiple safety critical domains, and hence verifying properties on these models has been an active topic of study over the last decade. One such verification question is the problem of sensitivity, which asks, given a DTE, whether a small change in subset of features can lead to misclassification of the input. In this work, our focus is to build a quantitative notion of sensitivity, tailored to DTEs, by discretizing the input space of the model and enumerating the regions which are susceptible to sensitivity. We propose a novel algorithmic technique that can perform this computation efficiently, within a certified error and confidence bound. Our approach is based on encoding the problem as an algebraic decision diagram (ADD), and further splitting it into subproblems that can be solved efficiently and make the computation compositional and scalable. We evaluate the performance of our technique over benchmarks of varying size in terms of number of trees and depth, comparing it against the performance of model counters over the same problem encoding. Experimental results show that our tool XCount achieves significant speedup over other approaches and can scale well with the increasing sizes of the ensembles.

Decision Tree Ensembles (DTEs) are widely used in critical fields like finance to make predictions based on tabular data. While these models are generally interpretable, they are not immune to bias or sensitivity issues, where small changes in input features can lead to drastically different outcomes. This paper introduces a new way to quantify this sensitivity by measuring how much of a model’s input space is susceptible to such fluctuations. By discretizing the input space into regions, the researchers provide a formal, scalable method to count how many of these regions contain sensitive pairs, offering a clearer picture of a model's overall fairness.

Defining Sensitivity in Ensembles

The authors define sensitivity as a situation where two inputs differ only in "sensitive" features (such as race or gender) but result in a significant difference in the model's output. Previous research focused on finding a single "sensitive pair" to prove a model is flawed. However, this paper argues that finding one pair is insufficient to understand the full scope of a model's behavior. Instead, they propose a quantitative approach: counting the total number of input regions that contain at least one sensitive pair. This provides a more comprehensive metric for evaluating the fairness and robustness of a Decision Tree Ensemble.

A Compositional Algorithmic Approach

To solve this counting problem, the researchers developed a tool called XCount. The process begins by encoding the decision tree ensemble into an Algebraic Decision Diagram (ADD), a symbolic structure that represents the model's logic. Because compiling an entire ensemble into a single ADD can be computationally expensive, the team developed a compositional technique. This method breaks the problem into smaller, manageable subproblems that can be solved in parallel. They then use a sophisticated splitting and merging strategy to combine these results, ensuring the final count remains accurate while maintaining formal error and confidence guarantees.

Performance and Validation

The researchers tested XCount against a large suite of over 3,000 benchmark instances, varying in tree depth and ensemble size. The results demonstrate that XCount significantly outperforms existing monolithic approaches and standard model counters, often achieving speedups by an order of magnitude. Beyond raw performance, the authors validated their tool by applying it to regularized models. They confirmed that as regularization techniques are applied to a Decision Tree Ensemble, the count of sensitive regions decreases, proving that their tool effectively captures the intended impact of fairness-enhancing interventions.

Comments (0)

No comments yet

Be the first to share your thoughts!