Consolidating Industrial Batch Process Data for Machine Learning


  • Simon Mählkvist
  • Jesper Ejenstam
  • Konstantinos Kyprianidis



Batch Process Analysis (BDA), batch preprocessing, Functional Data Analysis (FDA), Statistical Pattern Analysis (SPA), Kernel Principal Component Analysis (KPCA)


The paradigm change of Industry 4.0 brings attention to data-driven modeling and the incentive to apply machine learning methods in the process industry. Further, capitalizing on a great deal of data available is an adverse task. For batch processes, the dataset is in a threeway format (Batch × Sensor × Time). Depending on the process and the goal of the analysis, it might be necessary to aggregate batches together. For this reason, a campaign unfolding structure is applied. By grouping the batches under new labels relevant to the analytical goal, campaigns are created. These labels can be created from periodical occurrences, such as refurbishing the refractory lining in the case of the case study. In order to utilize the three-way batch format, it is necessary to align the batches. In order to address this, the feature-oriented approach Statistical Pattern Analysis (SPA) is applied. SPA derives statistics, e.g., mean, skewness and kurtosis from the time series, consequently aligning the batches. The SPA and the campaign approach create a dataset consisting of select statistics instead of an irregular three-way array. Functional data analysis (FDA) is used to smooth and extract first- and second-order derivative information from the sensors in which functional behavior can be observed before creating features. Principal Component Analysis (PCA) is used to examine the final dataset. Further, industrial processes are notoriously nonlinear, and even more so batch processes. Therefore, kernel-based principal component analysis (KPCA) is used to review the final dataset. The KPCA can accommodate different underlying characteristics by modifying the kernel function used.


F. He, and Z. Zhang. Nonlinear Fault Detection of Batch Processes Using Functional Local Kernel Prin-cipal Component Analysis. IEEE Access, 8:1–1. 2020. doi:10.1109/access.2020.3004564

Q. P. He, J. Wang, and D. Shah. Feature space monitoring for smart manufacturing via statistics pattern analysis. Computers and Chemical Engineering, 126:321–331. 2019. doi:10.1016/j.compchemeng.2019.04.010

Q. P. He, and J. Wang. Statistics pattern analysis: A new process monitoring framework and its applica-tion to semiconductor batch processes. AIChE Journal, 57(1):107–121. 2011. doi:10.1002/aic.12247

J. M. Lee, C. K. Yoo, and I. B. Lee. Fault detection of batch processes using multiway kernel principal component analysis. Computers and Chemical Engineering, 28 (9):1837–1847. 2004. doi:10.1016/j.compchemeng.2004.02.036

P. Nomikos, and J. F. MacGregor. Monitoring batch processes using multiway principal component anal-ysis. AIChE Journal, 40 (8):1361–1375. 1994. doi:10.1002/aic.690400809

J.O. Ramsay, and B.W. Silvermann. Functional Data Analysis. Springer Series in Statistics. Biometrical Journal, 40 (1):56–56. 1998. doi:10.1002/(sici)1521-4036(199804)40:1<56::aid-bimj56>;2-#

R. Rendall, L. H. Chiang, and M. S. Reis. Data-driven methods for batch data analysis – A critical over-view and mapping on the complexity scale. Computers and Chemical Engineering, 124:1–13. 2019. doi:10.1016/j.compchemeng.2019.01.014

R. Rendall, B. Lu, I. Castillo, S. T. Chin, L. H. Chiang, and M. S. Reis. A Unifying and Integrated Framework for Feature Oriented Analysis of Batch Processes. Industrial and Engineering Chemistry Re-search, 56 (30):8590–8605. 2017. doi:10.1021/acs.iecr.6b04553

B. Schölkopf, A. Smola, and K. Müller. Kernel principal component analysis. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformat-ics), 1327:583–588. 1997. doi:10.1007/bfb0020217

J. Wang, and Q. P. He. Multivariate Statistical Process Monitoring Based on Statistics Pattern Analysis. Industrial & Engineering Chemistry Research, 49 (17):7858–7869. 2010. doi:10.1021/ie901911p

H. Wang, and M. Yao. Fault detection of batch processes based on multivariate functional kernel princi-pal component analysis. Chemometrics and Intelligent Laboratory Systems, 149:78–89. 2015. doi:10.1016/j.chemolab.2015.09.018

O. Wu, A. E. F. Bouaswaig, S. M. Schneider, F. M. Leira, L. Imsland, and M. Roth. Data-driven degrada-tion model for batch processes: a case study on heat exchanger fouling. Computer Aided Chemical Engi-neering, 43:139–144. 2018. doi:10.1016/B978-0-444-64235-6.50026-7

O. Wu, A. E. F. Bouaswaig, L. Imsland, S. M. Schneider, M. Roth, and F. M. Leira. Campaign-based modeling for degradation evolution in batch processes using a multiway partial least squares approach. Computers and Chemical Engineering, 128:117–127. 2019. doi:10.1016/j.compchemeng.2019.05.038