Diversity Enforcement in Ensemble Models
Completed in The University of Manchester, 2024
My MSc thesis explores the role of ensemble diversity in machine learning, focusing on the development and application of a unified theory of diversity in ensemble models. Ensemble learning combines the outputs of multiple models to improve performance on prediction tasks, and diversity among ensemble members is a critical factor in their success. This project investigates the bias-variance-diversity trade-off, reproduces key findings from prior research, and extends the Negative Correlation Learning (NCL) framework to a broader family of loss functions, specifically Bregman Divergences.
Project Overview
- Duration: February 2024 – September 2024 (8 Months)
- Technologies and Tools: Python, PyTorch, scikit-learn, NumPy, Matplotlib
- Skills Developed: Machine Learning, Ensemble Methods, Theoretical Analysis, Experimentation, Math and Statistics
Key Objectives
- Understanding the Bias-Variance-Diversity Trade-off:
- Reproduce and validate the unified theory of ensemble diversity, which decomposes the expected loss into bias, variance, diversity, and noise components.
- Demonstrate the increase in ensemble diversity with the number of ensemble members, confirming foundational results from prior research.
- Analyzing Classical Negative Correlation Learning (NCL):
- Investigate the behavior of NCL under squared error loss, focusing on the impact of diversity enforcement on weak and high-capacity estimators.
- Provide empirical evidence supporting the benefits of diversity enforcement for weak estimators and its limitations for high-capacity models.
- Generalizing NCL to Bregman Divergences:
- Extend the NCL framework to a broader family of loss functions, including cross-entropy loss, and identify theoretical limitations of this generalization.
- Propose a new formulation for diversity enforcement that adapts to changing upper bounds during training.
Results and Insights
- Reproduction of Foundational Results:
- Successfully reproduced the critical demonstration that ensemble diversity increases with the number of ensemble members, validating the unified theory of diversity.
- Extended this demonstration to multiple loss functions (squared error, binary cross-entropy, multi-class cross-entropy) and datasets, confirming the generality of the findings.
- Insights into NCL Behavior:
- Demonstrated that diversity enforcement significantly benefits weak estimators, reducing ensemble risk by balancing bias, variance, and diversity.
- Identified that high-capacity models do not benefit from diversity enforcement due to increased variance without a corresponding increase in diversity.
- Theoretical and Practical Extensions:
- Proved that the upper bound for the diversity enforcement parameter (lambda) in NCL is not constant for Bregman Divergences, as it depends on the ensemble’s output, member outputs, and labels during training.
- Proposed a new formulation for generalizing NCL to Bregman Divergences, using a user-defined percentage of the dynamically calculated upper bound.
- Diversity Enforcement for High-Capacity Models:
- High-capacity models exhibited instability under diversity enforcement, with increased variance outweighing the benefits of diversity. This suggests the need for regularization techniques (e.g., layer normalization, residual connections) to stabilize training.
- Generalization to Cross-Entropy Loss:
- Initial attempts to generalize NCL to cross-entropy loss resulted in divergence, highlighting the limitations of a constant lambda value. The proposed dynamic upper bound formulation addresses this issue but requires further empirical validation.
- Experimental Validation:
- The project emphasized rigorous experimental validation, including the development of a flexible experiment runner for bias-variance-diversity decomposition across different machine learning frameworks.
Conclusion
This thesis advances the understanding of ensemble diversity by validating a unified theory of diversity, analyzing the behavior of classical NCL, and proposing a generalization of NCL to Bregman Divergences. The findings highlight the importance of diversity enforcement for weak estimators and provide a clear path forward for extending diversity-based training to a broader range of loss functions. The project results have potential applications in improving model robustness, generalization, and performance.