Variable Selection and Grouping for Large-scale Data-driven Modelling


  • Esko K. Juuso



variable selection and grouping, data analysis, intelligent methods, data-driven modelling


For large-scale systems, the number of possible variable combinations becomes very large. Variable grouping means finding feasible groups of variables for modelling. Systems can be divided into subsystems but even then the number of available variables is often impractically high to be used with the data-based methods. Interactive variable selection and grouping by comparing the performance of the model alternatives is a good solution if there are not too many variables. This paper describes possibilities of variable selection in large-scale industrial systems. It classifies the variable selection and grouping into four categories: knowledge-based grouping, grouping with data analysis, decomposition, and model-based grouping and selection. The data analysis part consists of correlation analysis and handling of high dimension data with principal components. These originally linear methodologies were extended to nonlinear systems by using the nonlinear scaling approach. Decomposition can be realised with various clustering methods or learning with case-based reasoning. The multimodel systems are handled with fuzzy set systems. Numerous studies based on linear multivariate statistical modelling have been reported in literature. The methodologies approaches have been tested in several applications: bioprocesses, continuous brewing, condition monitoring, web break sensitivity analysis and wastewater treatment. Industrial process data, a pilot system and a test rig were used in the analysis. Uncertainty handling is a part of the analysis method: uncertainty is represented with the degrees of membership.


