Ensemble learning is a powerful approach for improving the performance of supervised learning algorithms. However, understanding the nature and effect of diversity in ensembles has been an open question for over 30 years - until now.
Diversity in ensembles refers to the differences between the individual models that make up the ensemble. By combining multiple diverse models, ensembles are able to reduce the risk of overfitting and improve generalization performance.
Our framework reveals that diversity is a hidden dimension in the bias-variance decomposition of an ensemble. This means that by understanding this decomposition, we can gain insights into the effect of diversity on ensemble performance.
We prove a family of exact bias-variance-diversity decompositions for both classification and regression losses, such as squared and cross-entropy. This provides a powerful methodology for understanding the impact of diversity on ensemble performance.
The formulation of diversity is only dependent on two design choices: the loss and the combiner. By automatically identifying the combiner rule specific to the loss, we can better understand the role that diversity plays in improving performance.
For certain choices, such as 0-1 loss with majority voting, the effect of diversity is necessarily dependent on the target label. By accounting for this, we can build more effective ensembles for specific tasks.
Experiments illustrate how our framework can be used to understand the diversity-encouraging mechanisms of popular ensemble methods such as Bagging, Boosting, and Random Forests. By gaining a better understanding of diversity and its impact on ensemble performance, we can build more robust and effective learning systems.