Classification techniques are becoming essential in the financial world for reducing risks and possible disasters. Managers are interested in not only high accuracy, but in interpretability and transparency as well. It is widely accepted now that the comprehension of how inputs and outputs are related to each other is crucial for taking operative and strategic decisions. Furthermore, inputs are often affected by contextual factors and characterized by a high level of uncertainty. In addition, financial data are usually highly skewed toward the majority class. With the aim of achieving high accuracies, preserving the interpretability, and managing uncertain and unbalanced data, this paper presents a novel method to deal with financial data classification by adopting type-2 fuzzy rule-based classifiers (FRBCs) generated from data by a multiobjective evolutionary algorithm (MOEA). The classifiers employ an approach, denoted as scaled dominance, for defining rule weights in such a way to help minority classes to be correctly classified. In particular, we have extended PAES-RCS, an MOEA-based approach to learn concurrently the rule and data bases of FRBCs, for managing both interval type-2 fuzzy sets and unbalanced datasets. To the best of our knowledge, this is the first work that generates type-2 FRBCs by concurrently maximizing accuracy and minimizing the number of rules and the rule length with the objective of producing interpretable models of real-world skewed and incomplete financial datasets. The rule bases are generated by exploiting a rule and condition selection (RCS) approach, which selects a reduced number of rules from a heuristically generated rule base and a reduced number of conditions for each selected rule during the evolutionary process. The weight associated with each rule is scaled by the scaled dominance approach on the fuzzy frequency of the output class, in order to give a higher weight to the minority class. As regards the data base learning, the membership function parameters of the interval type-2 fuzzy sets used in the rules are learned concurrently to the application of RCS. Unbalanced datasets are managed by using, in addition to complexity, selectivity and specificity as objectives of the MOEA rather than only the classification rate. We tested our approach, named IT2-PAES-RCS, on 11 financial datasets and compared our results with the ones obtained by the original PAES-RCS with three objectives and with and without scaled dominance, the FRBCs, fuzzy association rule-based classification model for high-dimensional dataset (FARC-HD) and fuzzy unordered rules induction algorithm (FURIA), the classical C4.5 decision tree algorithm, and its cost-sensitive version. Using nonparametric statistical tests, we will show that IT2-PAES-RCS generates FRBCs with, on average, accuracy statistically comparable with and complexity lower than the ones generated by the two versions of the original PAES-RCS. Further, the FRBCs generated by FARC-HD and FURIA and the decision trees computed by C4.5 and its cost-sensitive version, despite the highest complexity, result to be less accurate than the FRBCs generated by IT2-PAES-RCS. Finally, we will highlight how these FRBCs are easily interpretable by showing and discussing one of them.
Multiobjective Evolutionary Optimization of Type-2 Fuzzy Rule-Based Systems for Financial Data Classification
Antonelli M.
;
2017-01-01
Abstract
Classification techniques are becoming essential in the financial world for reducing risks and possible disasters. Managers are interested in not only high accuracy, but in interpretability and transparency as well. It is widely accepted now that the comprehension of how inputs and outputs are related to each other is crucial for taking operative and strategic decisions. Furthermore, inputs are often affected by contextual factors and characterized by a high level of uncertainty. In addition, financial data are usually highly skewed toward the majority class. With the aim of achieving high accuracies, preserving the interpretability, and managing uncertain and unbalanced data, this paper presents a novel method to deal with financial data classification by adopting type-2 fuzzy rule-based classifiers (FRBCs) generated from data by a multiobjective evolutionary algorithm (MOEA). The classifiers employ an approach, denoted as scaled dominance, for defining rule weights in such a way to help minority classes to be correctly classified. In particular, we have extended PAES-RCS, an MOEA-based approach to learn concurrently the rule and data bases of FRBCs, for managing both interval type-2 fuzzy sets and unbalanced datasets. To the best of our knowledge, this is the first work that generates type-2 FRBCs by concurrently maximizing accuracy and minimizing the number of rules and the rule length with the objective of producing interpretable models of real-world skewed and incomplete financial datasets. The rule bases are generated by exploiting a rule and condition selection (RCS) approach, which selects a reduced number of rules from a heuristically generated rule base and a reduced number of conditions for each selected rule during the evolutionary process. The weight associated with each rule is scaled by the scaled dominance approach on the fuzzy frequency of the output class, in order to give a higher weight to the minority class. As regards the data base learning, the membership function parameters of the interval type-2 fuzzy sets used in the rules are learned concurrently to the application of RCS. Unbalanced datasets are managed by using, in addition to complexity, selectivity and specificity as objectives of the MOEA rather than only the classification rate. We tested our approach, named IT2-PAES-RCS, on 11 financial datasets and compared our results with the ones obtained by the original PAES-RCS with three objectives and with and without scaled dominance, the FRBCs, fuzzy association rule-based classification model for high-dimensional dataset (FARC-HD) and fuzzy unordered rules induction algorithm (FURIA), the classical C4.5 decision tree algorithm, and its cost-sensitive version. Using nonparametric statistical tests, we will show that IT2-PAES-RCS generates FRBCs with, on average, accuracy statistically comparable with and complexity lower than the ones generated by the two versions of the original PAES-RCS. Further, the FRBCs generated by FARC-HD and FURIA and the decision trees computed by C4.5 and its cost-sensitive version, despite the highest complexity, result to be less accurate than the FRBCs generated by IT2-PAES-RCS. Finally, we will highlight how these FRBCs are easily interpretable by showing and discussing one of them.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.