A Dymola-Python framework for data-driven model creation and co-simulation

The introduction of cyber-physical systems has been a recent development in energy systems. Cyber-physical systems contain digital components for applications such as monitoring or control. In many cases, modeling multiple aspects of such cyber-physical systems poses a challenge to conventional simulation tools. In addition, recent modeling approaches, such as data-driven modeling, are being applied. The combination of such data-driven models, which may consist of a different architecture than traditional models, with traditional models can be implemented through co-simulation methods. In co-simulation, components created from different simulation tools can be combined and coupled through standardized interfaces. This work presents a framework for data-driven model generation and co-simulation. The framework is implemented in Python and Dymola and is based on the Functional Mock-up Interface (FMI) standard. The framework implements the creation of data-driven models in Python, the generation of Functional Mock-up Units (FMUs) through the frameworks uniFMU and pythonFMU , as well the creation of a testbench model in Dymola and the co-simulation of this model. The framework is demonstrated on the application of a solar collector from a single family house heating system.


Introduction
The area of energy systems covers a wide range of applications, such as heating, cooling or electrical power systems. All these systems have in common that their demand for energy must be met by the energy providers while their energy demands are constantly growing. To respond to the increasing demand, energy providers have recently been focusing on embedding cyber-technologies into their systems in order to monitor and optimize system operation. This means that state-of-the-art energy systems are being extended into complex cyber-physical systems (Lund et al., 2017). The analysis of such cyberphysical energy systems poses new challenges in the area of simulation and modeling due to these systems' complexity (Palensky, 2014). Cyber-physical systems combine computational systems with other physical systems, meaning that their analysis requires combined modeling techniques for different system types. While the modeling of certain components can be implemented in specialized simulation tools, the full modeling of a combined system is a more difficult task. To model cyber-physical energy systems, different approaches exist, which can be classified into three groups: white-box, gray-box and blackbox modeling (Arendt et al., 2018). White-box methods include traditional physical modeling methods based on system dynamics. Gray-box models may also be based on system dynamics, but may contain assumptions or approximations. Black-box models may consist of a completely different architecture than the underlying system. Traditionally, energy systems are modeled in simulation tools based on the physical relations of their components. Physical models are created by analysing the physical properties of the system, and these models are implemented mostly as white-box or gray-box models and based on the knowledge of the system dynamics and parameters. In order to model and simulate these systems, often numerical solvers are used to solve the underlying differential equations, as described by (Gomes et al., 2018). The numerical simulation methods are then implemented by simulation tools such as, for instance, Dassault Systemes Dymola ® , MathWorks ® Matlab/Simulink or EnergyPlus™. In contrast to traditional modeling, the data-driven modeling approach has recently been gaining popularity. Data-driven models are mainly based on modeling the underlying system as a black box. This means that the architecture and the parameters of the system are arbitrary, any structure can be used as a model. Data-driven models are mainly implemented by machine learning (ML) methods, such as linear regression models, decision-tree based models or neural networks. In the data-driven approach, the models are trained on existing measurement data by using optimization methods. This approach was applied for instance in (Ghofrani et al., 2020) and (Xu et al., 2019). The advantage of the data-driven modeling approach is that the ML models are trained based on measurement data and do not require exact system knowledge and parameters. While domain knowledge is helpful in creating the models, it is not necessary to know all features of the underlying system beforehand. A recent approach in cyberphysical systems modeling is the combination of physical and data-driven models in a co-simulation (CS) environ- Proceedings of Asian Modelica Conference 2022 November 24-25, 2022, Tokyo, Japan ment. The term co-simulation describes the combination of different simulation tools or environments. This may include a combination of continuous-time and discretetime models, as well as simulation tools like Dymola or Matlab/Simulink. In co-simulation, different systems are integrated into a global environment. The co-simulation approach is used in applications such as building control systems, especially in model-predictive control . Applications in energy systems modeling or control often contain feedback loops containing components implemented in different simulation tools. These components must be coupled with each other through a defined interface. For this purpose, organizations such as the Modelica Association or the Institute of Electrical and Electronics Engineers (IEEE) have developed standards for co-simulation interfaces, such as the High-Level Architecture (HLA) (IEEE, 2010) or FMI (Modelica Association, 2020) standard. These standardized interfaces can be implemented by various tools without having to adapt the models for each simulation environment and are supported by different simulation tools. Additionally, these interfaces can be implemented by data-driven models, which may be created in programming languages such as Python. For our work, the FMI standard was selected.  (Leimeister, 2019). This framework is based on the buildingspy library. Another framework called PyMo was created by (Febres et al., 2014).

Main Contribution
This work presents a workflow called HybridCosim that combines the creation of data-driven models with cosimulation. In this workflow, data-driven models are automatically created and then combined with physical models inside a co-simulation environment. The workflow is based on the FMI standard 2.0. The data-driven models are created in Python, converted into FMUs, and then simulated in Dymola as a part of an automatically generated testbench. The framework supports the creation of ML models of different architectures, as well as simulation in Dymola. The framework is demonstrated on a case study of a solar collector.

Methodology
The presented framework consists of four steps. Firstly, a data-driven model of an existing system is trained in Python. This model is then converted into an FMU. For the FMU, a Modelica testbench model is generated. Finally, the testbench is simulated in Dymola. An overview of the created workflow is given in Figure 2. The first three steps of the workflow are executed purely in Python, the last step is executed through Python and Dymola. While the simulation itself is executed in Dymola, the orchestration and the result post-processing are done in Python. This framework is based on the research in (Falay et al., 2021) and (Wilfling et al.).

Model Training
To create data-driven models, we implemented a basic framework in Python to train models of different architectures, such as linear regression models, decision treebased models or Support Vector Machine (SVM) models. These models could be created based on different datasets and feature configurations. The models are based on the research in (Schranz et al.) and the Python packages scikit-learn (Pedregosa et al., 2011) and statsmodels (Seabold and Perktold, 2010).

Interfacing -FMI
In our work, the FMI standard was used as an interface between models of different types. Therefore, the models had to be converted into the FMU format, for which the uniFMU framework (Legaard et al., 2021) and the pythonFMU framework (Hatledal et al., 2020) were evaluated. The uniFMU framework allows to export models from different programming languages such as Python, C#, Matlab or Java into an FMU. uniFMU supports the FMI standard 2.0 and contains a graphical user interface to generate and validate FMUs. The pythonFMU framework supports FMU generation from Python files.

FMU Creation
In our work, machine learning models implemented in Python can be translated into FMU format through the pythonFMU or uniFMU framework. While the pythonFMU framework supports the generation of a full FMU from a Python model, the uniFMU framework requires additional steps for creating the FMU. The FMU format contains a model description in Extensible Markup Language (XML), in which the model interface, consisting of the model inputs, outputs and parameters, and the basic model structure, which may include dependencies, is defined. To create an FMU through uniFMU, the model description must be adapted to the interface of the model.
For the FMU creation through uniFMU, a method to adapt the FMU model description automatically depending on the required inputs and outputs for the model was created. When using the framework, either of the two frameworks can be selected.

Automatic Testbench Creation
In our framework, Dymola was selected as the main simulation master, therefore our top-level model had to be implemented in Dymola. To automatically create a simple testbench for the FMU, a Python module was created. This module could generate a Modelica model based on input data, a specification of input and output features, and the FMU file. In addition, components created in Modelica could be imported and added to the model. The datadriven model was imported into Dymola and connected to Dymola-native modules or other FMUs. With this structure, it was possible to create fully-coupled systems, such as feedback control loops, or simpler systems with fewer components.

Simulation
For the generated top-level model including the FMU, a co-simulation was executed in the Dymola environment. This simulation was implmented using parts of the process created in (Wilfling et al.). In our implementation, the main control for the simulation is implemented in Python. The Python controller then sends commands to Dymola, which executes the simulation. The simulation commands are based on Modelica .mos scripts, which are automatically generated in Python. Figure 2 gives an overview of the implemented simulation method.  (Wilfling et al.) Alternatively, the testbench could be simulated directly through Dymola.

Framework Implementation
The framework was implemented mainly in Python. The framework is structured into four Python packages, each of which contains a step of the workflow. For each package, an example testscript is available to execute the operations of the step. In addition, all steps can be executed in combination as a full workflow run. In this case, the four steps are executed sequentially. During the execution of each workflow step, different files are created, which are then used by the next steps. The combined workflow requires two components as inputs: a dataset, and a configuration file containing definitions of the model inputs and outputs. Figure 3 depicts the full workflow structure with input and output files.

File Structure
The results of an experiment using the combined framework are stored inside a directory structure containing all automatically generated files including the models and the simulation results. An overview of this structure is given in Figure 4.  The structure is separated into three directories: one for the model training results, one for the FMU files, and one for the Dymola testbench and simulation results.

Case Study
To demonstrate the proposed framework, a case study on a use case from the energy domain was performed. For this purpose, a solar collector from a single-family house heating system was selected (Wilfling et al.). In a singlefamily house, the main heating demand is generated from the central heating for the rooms and the warm water consumption. In order to give options to optimize the heating energy consumption of such a house, the heating system should be modeled as accurately as possible. For this purpose, two different architectures for the data-driven model were evaluated.

Application -Solar Collector
The application of the case study was the supply temperature prediction for a flat-plate solar collector. This collector, which was already available as a physical model (Falay et al., 2021), should be modeled through a datadriven model. For the collector, the supply temperature T S should be predicted based on the return temperature T R , the mass flow through the collector V d , the ambient temperature T A and the solar radiation S Global .

Underlying System
According to (Mahanta, 2020), the behavior of a flat-plate solar collector can be modeled through linear relations. The main factors affecting the solar collector supply temperature are the heat gain through the solar radiation and the heat loss to the ambient. While in the active state of the collector, the heat gain is affected by the mass flow through the collector. A simplified version of these relations can define the active behavior of the collector through Equation 1:

Data-driven Model
For the solar collector, a data-driven model was created through the model training part of the framework. To compare different model architectures, two models were created, one consisting of a linear regression model and one using Random Forest (RF) regression. The models were trained based on measurement data in a duration from 02/2019 to 10/2019, which was sampled with a timestep of 15 min. For the training, a train-test split of 0.8 was selected. The trained models were stored in the Pickle format.

FMU Creation and Testbench Generation
From the trained models, an FMU was created. Afterwards, a Dymola model to test the FMU was generated. This Dymola model was generated using the input measurement data and the description of the FMU inputs and outputs. Figure 5 depicts the generated Dymola model. The Dymola model contains the FMU and a Modelica CombiTimeTable containing the measurement data. For the CombiTimeTable, a text file was automatically generated from the input data to act as datasource.

Experimental Results
Finally, a simulation was executed for the generated Dymola model. The simulation duration was set to a time window of 30 days, with a timestep of 15 min. The results were post-processed in Python.

Performance Metrics
To evaluate the performance of the model, the metrics Coefficient of Determination (R 2 ), Coefficient of Variation of the Root Mean Square Error (CV-RMSE) and Mean Absolute Percentage Error (MAPE) (Falay et al., 2021) were selected. The performance metrics for the model are described in Table 1. The performance metrics show the higher accuracy of the RF model. However, the linear regression model performs only slightly worse than the RF model despite its simple structure. Figure 6 shows the timeseries analysis for the solar collector for a selected period of five days from the simulation duration. The timeseries analysis shows more accu- rate predictions of the data-driven model during daytime than during nighttime. This behavior was accredited to the characteristics of the solar collector, which is inactive during nighttime.

Timeseries Analysis
The prediction error plots for the solar collector case study during the full simulation duration are depicted in Figure 7.

Conclusion
We present a framework for data-driven model creation and co-simulation that allows the combination of different models. The framework is implemented in Python and Dymola and is based on the FMI standard. This framework allows automatic creation of data-driven models, translation into the FMU format, creation of a Dymola testbench model and simulation in Dymola. A case study performed on an application from the energy domain showed the performance of the created data-driven models.

Future Work
The current version of the framework gives many options for extensions. For instance, it is possible to extend the model training part of the framework to support additonal model types. The FMU creation part of the framework could be extended to support FMI 3.0, as well as include further extensions from FMI 2.0. Finally, the Dymola simulation part could be extended to support different simulation masters such as OpenModelica.