Efficient Global Multi Parameter Calibration for Complex System Models Using Machine-Learning Surrogates

In this work, we address challenges associated with multi parameter calibration of complex system models of high computational expense. We propose to replace the Modelica Model for screening of parameter space by a computational effective Machine-Learning Surrogate, followed by polishing with a gradient-based optimizer coupled to the Modelica Model. Our results show the advantage of this approach compared to common-used optimization strategies. We can resign on determining initial optimization values while using a small number of Modelica model calls, paving the path towards efficient global optimization. The Machine Learning Surrogate, namely a Physics Enhanced Latent Space Variational Au-toencoder (PELS-VAE), is able to capture the impact of most influential parameters on small training sets and delivers sufficiently good starting values to the gradient-based optimizer. In order to make this paper self-contained, we give a sound overview to the necessary theory, namely Variational Autoencoders and Global Sensitivity Analysis with Sobol Indices.


Introduction
To enable model based investigation of "real world" technical systems the underlying Modelica system models can quickly grow in size and computational expense.When they are applied in extensive parameter studies, in particular for model calibration or model based optimization, computation becomes a resource intensive task: if the objective function cannot be decomposed into submodel dependencies but depends on the model as a 'whole', then also the whole model needs to be simulated.
In practice optimization based on such models is limited to a few varied parameters and to local, gradient based optimization algorithms.If the modeller has sufficient knowledge on the model, reason-able choices of relevant parameters as well as starting points for the local optimization algorithm can be made from experience.But for complex models this empirical approach may suffer from overlooking parameters and the optimization algorithm running into local minima of the objective function due to the chosen starting points in parameter space.
In this paper we address these issues with a combined approach: A Machine Learning Model, namely a Physics Enhanced Latent Space Variational Autoencoder (PELS-VAE) (Martínez-Palomera, Bloom, and Abrahams 2020; Zhang and Mikelsons 2022) is trained on data generated by the Modelica model.It captures the dependencies of model output to the most influential parameters, determined by a preceding sensitivity analysis (Sobol 1993), while requiring a limited set of training data.This surrogate is computationally cheap, and can be used to apply a global optimization algorithm that relies on a large number of model runs.After this global screening, a subsequent local optimization based on the original physical model (polishing) is performed.(Freund and Schmitz 2021) We choose to test our approach on a computational inexpensive, thermal Modelica model of a single office (Figure 1) with measurement data available for calibration (Freund and Schmitz 2021).Like this, data generation for the machine learning models is fast and we are able to focus on the application of the PELS-DOI 10.3384/ecp204107 Proceedings of the Modelica Conference 2023 October 9-11, 2023, Aachen, Germany VAE for parameter calibration, while being able to cross check all obtained results against a brute force global optimization based on the original model.Various Optimization Tools suitable for Modelica models already exist, like the Dymola Optimization Library, GenOpt (University of California 2023), ModestPy for Parameter Estimation with FMUs (Arendt et al. 2018), AixCaliBuHA (Wüllhorst et al. 2022) or ModelOpt (XRG Simulation GmbH 2023).All of these tools vary in detail, but build on common-known global and local Optimization Algorithms like Particle Swarm Optimization, Genetic Algorithms, Sequential Least Squares or Nelder-Mead Algorithm and do not generate surrogate models.In contrast to this, surrogate based optimization aims to represent computationally expensive models by the use of a simpler surrogate to significantly save computational resources.Different kinds of surrogate models like linear regression, support vector regression, radial basis functions or kriging (Gaussian process regression) are commonly used (Bhosekar and Ierapetritou 2018).Artificial Neural Network as a generalization of regression models are also a possible surrogate choice.A promising subclass is Bayesian Optimization, which consists of a probabilistic surrogate model and a sequential called loss function that enables optimal, active sampling of the objective function that should be replaced (Shahriari et al. 2016).Bayesian Optimization proved efficient in parameter calibration of a Modelica-modeld HVAC-system (Martinez-Viol et al. 2022).In comparison to these techniques, our approach replaces the actual physical model for a fixed scenario, not the cost function of an optimization objective.
This paper is organised as follows: section 2 introduces the used Modelica model of an office room, the PELS-VAE architecture and training, as well as the applied optimization techniques.In section 3 we present the results of applying our approach for calibration of the Modelica model.Finally we summarize our findings and give an outlook to present and future work in section 4. In Appendix B, we sketch the applied global sensitivity analysis.

Calibration Problem
The modelled thermal zone is a room of a large-scale office-building (46 500 m 2 ) and high energy efficency (primary energy demand < 70 kW h m −2 ) (Freund and Schmitz 2021).The buildings operation has been explored in previous research projects ( (Niemann and Schmitz 2020), (Duus and Schmitz 2021), (Freund and Schmitz 2021)).For example, Model-Predictive-Control (MPC) was used to enhance thermal usercomfort and decrease energy demand (Freund 2023).MPC requires accurate models which can be obtained by calibrating Modelica-Models with measurement data.A scheme of an office is shown in Figure 1.Heat is supplied by thermal activated ceilings (TAC), i.e. by circulating warm water through pipes in the concrete core of the slabs, and mechanical ventilation with preheated supply air.The large area of the ceilings allows the usage of heat-pumps for low temperature heating, while the high thermal capacity of the concrete slabs enables considerable time delay between heat supply to the ceiling and heat supply to the room.For this building, measurement data is recorded since 2014 at more than 1100 sensors every minute.32 office spaces are equipped as reference zones with various sensors.(Freund and Schmitz 2021) For this project, we use the same data than in prior studies (Freund 2023).The calibration target is to fit the model output T Air to the recorded measurement T Air,meas by adjusting the model parameters θ within their bounds[θ − , θ + ], employing an error metric such as the Mean Squared Error (MSE): which is in general a constrained, nonlinear optimization problem.
The recorded data consists of several timeseries that serve as an input to the physical model of the thermal zone.The model inputs are outside air temperature T A , supply temperature of the corresponding TAC heating circuit T Sup,TAC , boolean signal of supply y Sup,TAC , supply temperature of mechanical ventilation T Sup,MV , boolean signal of supply y Sup,MV , global solar radiation and occupancy state.For the heat exchange at sun-exposed walls, an equivalent outdoor air temperature T A,Eq is used.Internal heat gains by persons, lighting or other equipment QInt are calculated using the by constant heat gain factor multiplied with an heuristic based on measured occupancy state and the buildings electric energy consumption load profile.Internal and external heat gains are split into convective parts acting on the air volume and radiative parts acting on the internal masses.We use data of the identification-timeframe 21.02.2018-14.03.2018(Freund 2023)).

Gray-Box Model
In this work, a Gray-Box Model introduced by (Freund and Schmitz 2021) shall be calibrated.Gray-Box Modeling referes to a modeling approach, where a physical model is combined with data-driven approaches.Physical knowledge is used to derive a model structure, while parameters are identified using measurement data (Kathirgamanathan et al. 2021).
The gray-box model (Figure 2) consists of seven resistances and four capacities (R7C4 model).The four  Based on the EMPA model (Koschenz and Lehman 2000), a simplified model for the TAC is used consisting of two resistances R TAC1 and R TAC2 .By assuming equal room temperatures below and above the thermoactive ceiling, the two heat flow paths to respectively the room above and below the ceiling can be transformed into a single heat flow path, resulting in a R2C1 TAC model (Sourbron 2012).
The external wall is modeled with two resistances for the envelop (R W1 and R W2 ) and one resistance for the glazing R G .Mechanical ventilation is represented with one resistance R MV .The resistance R Int describes the heat exchange between the air volume and the internal masses.Heat exchange between adjacent zones is neglected, since the heating control is for all zones of a building section the same.
Consequently, the simulation model has 11 parameters (see Table 1).Additionally, we introduce the parameter f sol to tune the fraction of the window projected global radiation flowing to the office and the parameter QInt as heat gain factor of the heuristic occupancy signal.The initial temperature T TAC (t = 0) of the TAC as the mass with the highest capacity is introduced as a parameter to the optimization problem.Estimated values for these parameters are obtained by using the documentation of constructional elements and values from literature.These estimates are used to generate training data for the autoencoder models, which is for most parameters performed in the range of 1 5 to 5 times the estimated value.We choose these broad ranges in order to account for situations, where little knowledge on the estimates is available.In practice they should be narrowed as much as possible by available information.The obtained model is exported by using the Functional Mock-up Interface (FMI ) standard and used in Python-Scripts with FMPy (FMPy 2023).We simulate with a time step of 1800 s.

Physics-Enhanced Latent Space Variational Autoencoder
The general idea of Autoencoders is to encode data of a dataset in a lower-dimensional compression that is sufficient to represent the variation within that dataset.For example, a collection of images of people could be reduced to characteristics like gender, hair color, skin color, pose etc. From this compression, data can be reconstructed with a decoder that learned the influence on the compression of these attribute variations to reconstruct an image from it.In general, the lower-dimensional compression is said to be in a "latent space", i.e. a space whose behavior is hidden and cryptic to us.A Encoder-Decoder Neural Network structure is an unsupervised learning technique.However, the representation of attributes in latent space can be learned, i.e. by a neural network ("Regressor").By only using the Regressor and the Decoder, new data can be generated, such that an Generative Adversarial Network (GAN) is obtained.
A challenge is to chose an adequate dimension for the latent space to prevent the network from just memorizing the data (Jordan 2018a).Various tech-  -VAE).The time-series x is introduced to the Encoder ψ en , which transforms it to a latent-space distribution with mean µ and variance σ, which can be decoded by the Decoder ϕ de by sampling z with the auxiliary Gaussian variable ϵ to reconstruct the time-series as x = 1 L L ϕ de (z).The Regressor φ re is trained simultanously to predict the mean and variance of the latent space distribution.As in (Martínez-Palomera, Bloom, and Abrahams 2020), the physical parameters θ are introduced to all models.(Zhang and Mikelsons 2022) niques have been proposed for this regularization, and a widely used approach is to learn probability distributions within the autoencoder structure, making it an Variational Autoencoder (VAE).Hands-on explanation for Autoencoders can be found in (Jordan 2018a), while for VAE in (Jordan 2018b).Within this paper, we build on the implementation of (Zhang and Mikelsons 2022) to predict time-series x (i.e.our temperature trajectories), with its architecture shown in Figure 3.
For the interested reader, a more detailed explanation of the theory behind the Autoencoder and its training loss function is provided in Appendix A.

Training Data Generation
To train the PELS-VAE model to mimic the behaviour of the physical model, i.e. learning the behaviour x(θ), the machine learning model needs to be exposed to labeled training data (x|θ).Therefore, we sample n times uniformly in parameter space: and run the physical Modelica Model to get the posterior x of θ.The training data should cover well the parameter space as well as the output space, which can be checked by plotting the corresponding confusion plots (combining every θ i with each other) and plotting all outputs of the physical model.Combining these plots of the model outputs with available measurement data, allows to make a first check if the designed physical model is able to capture the observed behaviour (see Figure 4).

Optimization-Based Parameter Identification
This paper aims to calibrate a model by minimizing the Mean Squared Error (MSE) between the model output and recorded measurements to determine a globally minimizing parameter combination.An overview of the applied methods is provided in Table 2 and discussed further below.
To demonstrate the superiority of our proposed method over existing optimization techniques, we combine a FMU of the Modelica model with selected optimization methods from SciPy and compare them with our introduced methods that use the Physics-Enhanced Latent Space Variational Autoencoder (PELS-VAE).
The investigated methods that combine a SciPy optimizer with an FMU encompass scalar or vector-like objectives, gradient-descent or non-gradient-descent methods, and can be categorized as either local or global optimization techniques.We anticipate gradient-based optimizers to converge quickly and expect further improvements for the LS-TRF approach, which utilizes residuals as the objective, as the optimizer gains more knowledge about the optimization step consequences compared to scalar objectives.On the other hand, we consider Differential Evolution, a genetic algorithm (GA), a global optimization technique, albeit with the drawback of requiring a higher number of model evaluations.All local techniques in this study necessitate initial values for the parameters, which may be challenging to derive in practical applications.To address this, we combine each local technique with a multistart approach, where the optimization is initiated n start times using starting values randomly distributed around the given initial parameter values.
Based on these evaluations, we propose to combine a well-trained computationally cheap PELS-VAE .neural-network (i.e.capable of evaluating 10 000 parameter combinations in a few seconds on a GPU) with a genetic algorithm to determine a parameter set that achieves global optimization.Additionally, we propose a 2-Phase approach in which the parameter combination determined by PELS-VAE coupled with a genetic algorithm serves as starting point for a polishing phase.The polishing phase employs the LS-TRF algorithm coupled with the FMU of the physical model to be calibrated.This approach is intended to compensate for inaccuracies that may arise when replacing the physical model with a machine learning surrogate model.

Sensitivity Analysis
In order to evaluate the impact of different model parameters to the room temperature, we employ a sensitivity analysis based on Sobol indices as described in Appendix B. The result is shown in Figure 5, where for each time step the Sobol indices are plotted.Obviously the impact of different parameters changes with time: due to heating with TAC and air supply during daytime the "passive" building properties UWin and rExt1 become less important.This can be used in order to potentially limit the number of parameters in the overall analysis or in the training of the Autoencoder, as parameter dependencies with large impact are faster learned, that is less training is required (see section 3.2).Often this will be sufficient for the global phase 1 of the optimization approach described in this paper (see section 3.3.3).

Autoencoder Training
The Autoencoder training was carried out using different numbers of samples n, a varied dimension of the latent space (dim(z x )), and varied dimension of the hidden layers.The analysis, shown in Figure 6, was performed using the same test set (n = 320) for all experiments.Firstly, the Mean Absolute Error (MAE) was observed to decay with an increasing number of samples.Specifically, for 32 samples, the MAE was approximately 2, which decreased to around 0.07 for 4096 samples.Notably, with 1024 samples, the MAE reached 0.1, and further quadrupling the sample size only resulted in marginal improvements.
Secondly, models trained with different hyperpa-0.00.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 time rameters show variation in MAE.Although at higher numbers of samples the variations may be tolerable, at n = 256 the influence of hyperparameter selection lies in the range of 0.4 to 1.1 which might not be appropriate.
Lastly, the dimension of the latent space (dim(z x )) was found to scale with the complexity of the time series.Although the model had 14 parameters, the best latent space dimension are 64, 128, or 256.Reasons for this assumption are that θ was also directly introduced in the decoder and we found by inspection, the worst performing models had low dim(z x ).
Overall, we find that the influence of chosen hyperparameters changes the training outcome, although its influence is limited, i.e. all hyperparameter sets produced results in comparable ranges with no "failing" hyperparameter sets.We conclude from this, that the Autoencoder training is quiet robust.
To further assess the performance of the Autoencoder, we have evaluated the prediction error using test sets with varying variability.In the case of large sample sizes n, the median lies at the midpoint of the parameter space of each dimension, approximated as θi to each bound.To generate the test sets, we sample with different δ as follows: We generate two kinds of test sets: A category in that all parameters are varied and a category in that only the six most important parameters (subsection 3.1, Figure 5) are varied.The Mean Absolute Error (MAE) for these test sets, evaluated on the models trained with the best-performing hyperparameters determined previously, is presented in Figure 7.For the following, we can exclude a discussion about the numerical influence of the parameter value magnitudes as all parameters are normalized by mean and standard deviation before they are fed into the Neural Networks.First, we take a closer look on the variation of important parameters ( 7a): We previously observed the MAE in Figure 6 with a variability of δ = 100%.However, by reducing the variability and excluding the border regions of the parameter space, the prediction error of the Autoencoder decreases.For instance, in the case of a training size of n = 32, the error is reduced from 0.8 Kelvin to approximately 0.6 Kelvin at 80% variability.
Figure 7 also includes the 90th percentile of the prediction error.It is evident that certain predictions exhibit considerably higher error than the mean prediction error, which can pose challenges in the optimization process, particularly with Autoencoder models trained on smaller datasets.However, by reducing the variability in the parameter space, the 90th percentile error also decreases.Secondly, an analysis is conducted to examine the variation of all parameters, as shown in Figure 7b.Interestingly, it is observed that for small sampling sizes (n = 32 to 128), reducing the variability δ leads to an increase in the mean absolute error (MAE), while this trend does not persist for larger sampling sizes.This finding may initially seem counterintuitive, as one might expect that when varying all parameters, the MAE would decrease with overall less variability, compared to varying only the important ones and leaving the others unlimited.However, parameters that are considered less important contribute less to the observed variation in the output of the physical model.Consequently, when training sets are small, the Autoencoder faces challenges in capturing the influence of these less important parameters on the observed trajectories.By limiting the variation of all parameters to, for example, δ = 0.2, a larger proportion of parameters resides in the inner part of parameter space, which can be more difficult for the Autoencoder to learn with small training sets as the majority of variance is produced by the influential parameters.Consequently, this results in an increase in the mean absolute error.
From this analysis, we conclude that the Autoencoder performs better when predicting parameter combinations θ that are more centered within the training parameter space.When selecting the bounds [θ − , θ + ], it should be ensured that they are larger than the parameter region where we anticipate the calibrated parameter results to lie.Furthermore, for small number of samples in the training set, the Autoencoder faces difficulties learning properly the influence of less influential parameters on the model output, while learning the impact of the more influential.However, to integrate the influence of the less influential parameters on the model output variation, they should still be sampled during training data generation.This property of the Autoencoder enables to resign on a Sensitivity Analysis before training it.

Model Calibration
In the following section, we present our findings regarding the curve-fitting methods minimizing MSE between model output and measurement outlined in Table 2.This section is organized as follows: firstly, we present the results obtained from the Optimizers directly coupled to the Modelica Model's FMU, along with the corresponding multistart approach (refer to Table 3), and gain insights to the uniqueness of a solution.Secondly, we showcase the optimization results achieved using the surrogate PELS-VAE Model (see Table 4) and highlight the advantages of our proposed method.

Direct Optimizer Coupling
First of all, one should keep in mind that the measurement signal is prone to error which results from measurement uncertainty of the temperature sensor (≤ 0.5 K (Freund 2023)), the data processing and the position of the sensor in the room.The calibration results obtained from the directly coupled optimizer are presented in Table 3.The majority of methods achieve a final Mean Squared Error (MSE) of approximately 0.01, although they vary significantly in terms of required model calls.Among the local optimizers, LS-TRF achieves the lowest number of iterations, with 261 model calls using the given initial value.SLSQP follows with 3-6 times higher iterations.The none-gradient optimizers Nelder-Mead and Powell perform less efficiently, requiring 5000 model calls (limited by the predefined iteration limit) with the given initial value.The notable difference between SLSQP with a scalar objective and LS-TRF with a vector-like/residual objective can be attributed to the fact that the residual objective allows after calculating the gradient for a more detailed consideration of the consequences of optimizer steps.
When initial values are poorly known, global opti-

Uniqueness of Solution
To gain more insights into the uniqueness of the solution to our optimization problem, a more detailed analyis of the best-performing algorithm LS-TRF was performed.To perform this a benchmark, a high number of starts (256) was chosen.The identified parameter combinations results were clustered with K-Means Clustering around common centroids (Pedregosa et al. 2011) with the 3 largest groups depicted in 8a, while the 5% best solutions are shown in 8b.From these results, we can infer two insights: First, the optimization problem is ambiguous: one parameter can compensate for the effect for another, e.g. in 8b, a high capacity cExt1 of the external wall can compensate for low heat resistance rExt2 and vice-versa.Furthermore, the optimization problem is clearly non-convex, i.e. it hast multiple local minima and the identified parameter combination is depending of the initial value when using local optimizer, which can be seen by the difference between the MSE of the best 5% results with 0.0114 and the average value of 0.0364.If the the problem would be convex, every initial value should lead to the same solution.Therefore, multiple parameter combinations might lead to equally well performing calibrated models.

Calibration with Surrogate PELS-VAE Model
The results of the model calibration performed with the surrogate PELS-VAE Model are shown in Table 4 and Figure 9  To compensate for that, polishing of the achieved results with the LS-TRF Algorithm, (local, gradientbased, vector-like objective), is performed.This process is illustrated in Figure 9.For all sampling sizes, the MSE is reduced considerable to a magnitude of MSE f (θ opt ) ≈ 0.012.More important, comparing the results for n train = 32, 64, 128, a MSE comparable to that of the LS-TRF directly coupled with the Modelica model with an initial guess is achieved (Table 3), while requiring less or comparable model calls.This effect could be explained as following: As the Autoencoder learns the model reaction on different parameter combinations, especially for the most influential parameters (see subsection 3.2), it allows for a "screening" of parameter space to find a good starting point for the following gradient-based optimization with the exact Modelica-Model.At higher sampling sizes, the prediction gap decreases, which results in the number of model calls reduced as well.
Depending on the number of training samples, one might argue at which point we achieve a screening which is sufficient to call this approach a "global method".
To stress the advantage of this proposed novel method of model calibration: Using the Autoencoder allows a screening of parameter space, which relieves us of the burden of finding an initial value for the optimization, that could potentially even lead us into the "trap" of a local minimum.

Conclusion
In this paper, we address the challenges associated with physics based Modelica models increasing in complexity and computational expense regarding optimization-based multi parameter calibration.To overcome these issues, we present a novel approach that enables computationally efficient parameter calibration by using a Machine-Learning Surrogate.
To showcase our developed method, we use a simple thermal zone model implemented in Modelica, which allows to focus on the analysis of the proposes method.
The used Machine-Learning Surrogate is a Physics-Enhanced Latent Space Variational Autoencoder (PELS-VAE).It provides efficient model regularization and robust training.We propose to combine a PELS-VAE trained on a small dataset with a Genetic Algorithm (as PELS-VAE inference is computational cheap) to screen parameter space for well-performing parameter regions.To achieve best-performing results, we furthermore propose to polish the achieved result with a gradient-based residual-objective optimizer (LS-TRF).
To compare our approach to existing alternatives, we have tested a variety of optimizers and found significant variation in number of required model calls and strong dependence on initial values.When moving towards global optimization, the usage of multi-start approaches or global optimizer quickly scales significantly the number of model calls, making this potentially infeasible for computational demanding system models.
We were additionally able to show that the chosen optimization problem is non-convex and has ambiguous solutions.
We also perform a detailed analysis of the PELS-VAE application.By analyzing the training process, we find that hyperparameter variation has limited impact on the training process, i.e. we have a robust training, while predicting time-series that are more centered within the training parameter space exhibit considerably lower prediction error.
Our results provide evidence that even PELS-VAE trained with small datasets (32-128 samples) and resulting high prediction errors proved effective to screen parameter space for initial values which are then used in a gradient based optimizer.We provide indications that the PELS-VAE is able to capture the impact of most-influential parameters on small training sets.Comparing to the best-performing optimizer with the need for an initial value, we were able to show that our initial value free method achieved comparable MSE with comparable number In summary, our proposed method offers an effective solution for calibrating complex models.
Using the PELS-VAE models allows for a screening of parameter space with a low number of model calls, and relieves us from the burden of fining suitable initial values for local optimizers.For future work, our method will be applied to other examples like a White-Box Model of the office (see Figure 10) to prove its suitability for various kind of optimization problems.Furthermore, the training process could be improved by adaptive online data generation, narrower parameter ranges, other layers in the network and embedding of multiple Modelica model outputs.

Figure 1 .
Figure 1.Schematic of standard single office, taken from (Freund and Schmitz 2021)
state variables T W (external wall temperature), T Air (indoor air temperature), T Int (temperature of internal masses) and T TAC (TAC core temperature) correspond to the four thermal capacities C W , C Air , C Int and C TAC .

Figure 3 .
Figure3.Physics-Enhanced Latent Space Variational Autoencoder (PELS-VAE).The time-series x is introduced to the Encoder ψ en , which transforms it to a latent-space distribution with mean µ and variance σ, which can be decoded by the Decoder ϕ de by sampling z with the auxiliary Gaussian variable ϵ to reconstruct the time-series as x = 1

Figure 6 .
Figure 6.Hyperparameter variation (latent space dimension dim z x and dimension of hidden layers) over training sets with different number of samples n, tested with same uniformly sampled test set, all within [θ − , θ + ].The bestperforming model for each training dataset size is marked by a star.(training performed for day 8-16 of identification timeframe)

Figure 5 .
Figure 5. First order Sobol indices for model parameters, plotted ordered by mean value Only important parameters sampled with δ, for nonimportant parameters δ = 1.All parameters randomly sampled.

Figure 7 .
Figure 7. Mean Absolute Error on randomly sampled test sets with different maximum deviations of parameter combinations from the median value of the training data (θ i ∼ U( θi − δ∆θ i , θi + δ∆θ i )).The mean absolute error over all time-series of a test set as well as the 90% quantile is given for the best performing models trained with different numbers of time series.(training performed for day 0-10 of identification timeframe)

Figure 8 .Figure 9 .
Figure 8. Identified normalized parameters for Least-Squares Trust Region Reflective Algorithm with 256 starts.

Figure 10 .
Figure 10.Physics based model of the office with XRGsimulation's HumanComfort Library

Table 1 .
Description of the 14 RC-Model Parameters and Corresponding Parameters of the Modelica Model.

RC-Model Description Modelica Model
rExt2 R G Window Resistance UWin R MV Mechanical Ventilation VSup R Int Internal Heat Exchange rInt f sol Solar Gain Fraction fSol QInt Internal Heat Gains qIntOcc T TAC (t = 0) Initial Value Validationand test sets have a size of 320 samples.To make results comparable, validation and test sets are the same for all models.The validation set is used to validate the model during the training process to select well-generalizing models and to early stop the training if no further improvement is happening.The test-set is used to determine the final performance of the model, unbiased by the selection through the test set.
As the purpose of this paper is to determine possible reduction in required simulations of the physical model, we generate training sets with different sizes in the range (32 to 4096).

Table 2 .
Optimization methods used in this paper for calibration.If the objective is scalar, a M SE = µ(x sim − x meas ) 2 is used as objective, if the objective is residual, the vector of squared residuals at the simulation time steps [(x sim (t 0 ) − x meas (t 0 )) 2 , (x sim (t 0 + ∆t) − x meas (t 0 + ∆t)) 2 , . . .] is used as objective.For all techniques from SciPy, default settings are used.Iterations are limited to reasonable values and tolerances are adapted to the FMU -settings.

Table 3 .
Calibration Results of Methods which were directly coupled with the Modelica Model's FMU with achieved MSE f (θ opt ).
. For the MSE calculated based on the Autoencoder prediction (MSE ϕ de ,φ re ( θopt )), MSE-values comparable to the direct coupling of Optimizer and Modelica-Model (≈ 0.01) are achieved.However, as shown in subsection 3.2, the Autoencoder is prone to prediction errors, i.e. for some parameter combinations, the predicted room temperature trajectories are more faulty than others.Because of this, the MSE of the parameter combination determined by the GA and the Autoencoder, denoted as θopt , calculated with the Modelica-Model f , MSE f ( θopt ), can be considerably larger than the predicted MSE ϕ de ,φ re ( θopt ).This effect occurs at low numbers of training samples and decreases with higher sampling n train , i.e. a MSE-gap of 1.05 to 3.44 at n train = 32 to 128 is reduced to a gap of 0.02 to 0.22 at n train = 512 to 4096.Although this increase might be negligible at low magnitude, for the results obtained with low number of training samples it might

Table 4 .
Calibration Results of PELS-VA coupled with Differential Evolution Genetic Algorithm (GA) for different number of training samples n train , MSE calculated by PELS-VAE (MSE ϕ de ,φ re ( θopt )) and with FMU (MSE f ( θopt )) to determine prediction error introduced by the Autoencoder, number of steps n opt of polishing with LS-TRF and achieved MSE f (θ opt ).