Increasing Interpretability and Prediction Rate by combining Self-organizing Maps with Modeling Algorithms
Keywords:explanation, self-organizing map, risk estimation, postprocessing
AbstractWe consider supervised learning problems, for which we need not only the accurate model, but also the model, that explains the relation between inputs and a target variable. There are modeling problems, when production experts can measure their confidence in the modeling results by modeling metrics, such as accuracy, but need an explanation for what was the reason of desirable or undesirable situation or system state in the past. In this study we utilize a combination of self-organizing maps and multiple linear modeling to increase the interpretability and accuracy. We assume that the target variable can be explained differently by different patterns that characterizes inputs data. By solving clustering problem for subset of inputs, we have structured data and can relate each cluster to its representative or cluster profile, which explains the cluster. Based on that structure we build linear model for each cluster dataset, and coefficients of this model explain the influence of factors for particular inputs characteristics. To cut the number of inputs we use L1-regularization for linear model. Proposed approach was tested on several industry related problems and implemented in application.
Winston Chang, Joe Cheng, JJ Allaire, Carson Sievert, Barret Schloerke, Yihui Xie, Jeff Allen, Jonathan McPherson, Alan Dipert and Barbara Borges, shiny: Web Application Framework for R. R package version 1.6.0. https://CRAN.R-project.org/package=shiny. 2021
James Gareth, Daniela Witten, Trevor Hastie, Robert Tibshirani, An Introduction to Statistical Learning. New York, NY: Springer. 2013
David Gohel, Panagiotis Skintzos, ggiraph: Make 'ggplot2' Graphics Interactive. R package version 0.7.8. https://CRAN.R-project.org/package=ggiraph. 2020
Jerome Friedman, Trevor Hastie, Robert Tibshirani, Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1): 1-22. 2016. URL https://www.jstatsoft.org/v33/i01.
Teuvo Kohonen. Self-Organizing Maps. Springer, New York. doi:10.1007/ 978-3-642-97610-0. 2001
Max Kuhn, Kjell Johnson, Applied predictive modeling. Springer. 2016.
Gwo-Fong Lin, Tsung-Chun Wang, Lu-Hsien Chen, A Forecasting Approach Combining Self-Organizing Map with Support Vector Regression for Reservoir Inflow during Typhoon Periods, Advances in Meteorology: 1-12, 2016. https://doi.org/10.1155/2016/7575126
R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org. 2018
Ron Wehrens, Kruisselbrink, Flexible Self-Organizing Maps in Kohonen 3.0. Journal of Statistical Software, 87(7): 1 - 18. doi : http://dx.doi.org/10.18637/jss.v087.i07. 2018
Hadley Wickham, ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
Copyright (c) 2022 Ivan Ryzhikov, Mikko Huovinen, Yrjö Hiltunen
This work is licensed under a Creative Commons Attribution 4.0 International License.