Increasing Interpretability and Prediction Rate by combining Self-organizing Maps with Modeling Algorithms

Authors

  • Ivan Ryzhikov
  • Mikko Huovinen
  • Yrjö Hiltunen

DOI:

https://doi.org/10.3384/ecp2118592

Keywords:

explanation, self-organizing map, risk estimation, postprocessing

Abstract

We consider supervised learning problems, for which we need not only the accurate model, but also the model, that explains the relation between inputs and a target variable. There are modeling problems, when production experts can measure their confidence in the modeling results by modeling metrics, such as accuracy, but need an explanation for what was the reason of desirable or undesirable situation or system state in the past. In this study we utilize a combination of self-organizing maps and multiple linear modeling to increase the interpretability and accuracy. We assume that the target variable can be explained differently by different patterns that characterizes inputs data. By solving clustering problem for subset of inputs, we have structured data and can relate each cluster to its representative or cluster profile, which explains the cluster. Based on that structure we build linear model for each cluster dataset, and coefficients of this model explain the influence of factors for particular inputs characteristics. To cut the number of inputs we use L1-regularization for linear model. Proposed approach was tested on several industry related problems and implemented in application.

References

Winston Chang, Joe Cheng, JJ Allaire, Carson Sievert, Barret Schloerke, Yihui Xie, Jeff Allen, Jonathan McPherson, Alan Dipert and Barbara Borges, shiny: Web Application Framework for R. R package version 1.6.0. https://CRAN.R-project.org/package=shiny. 2021

James Gareth, Daniela Witten, Trevor Hastie, Robert Tibshirani, An Introduction to Statistical Learning. New York, NY: Springer. 2013

David Gohel, Panagiotis Skintzos, ggiraph: Make 'ggplot2' Graphics Interactive. R package version 0.7.8. https://CRAN.R-project.org/package=ggiraph. 2020

Jerome Friedman, Trevor Hastie, Robert Tibshirani, Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1): 1-22. 2016. URL https://www.jstatsoft.org/v33/i01.

Teuvo Kohonen. Self-Organizing Maps. Springer, New York. doi:10.1007/ 978-3-642-97610-0. 2001

Max Kuhn, Kjell Johnson, Applied predictive modeling. Springer. 2016.

Gwo-Fong Lin, Tsung-Chun Wang, Lu-Hsien Chen, A Forecasting Approach Combining Self-Organizing Map with Support Vector Regression for Reservoir Inflow during Typhoon Periods, Advances in Meteorology: 1-12, 2016. https://doi.org/10.1155/2016/7575126

R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org. 2018

Ron Wehrens, Kruisselbrink, Flexible Self-Organizing Maps in Kohonen 3.0. Journal of Statistical Software, 87(7): 1 - 18. doi : http://dx.doi.org/10.18637/jss.v087.i07. 2018

Hadley Wickham, ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.

Downloads

Published

2022-03-31