Hybrid Power Systems Simulation and Optimization Utilizing SSP and FMI

Collaborative model-based development of the hybrid power system often requires large-scale co-simulation and system parameter optimization. In this study, we investigate an architecture for parallel processing simulation of SSP (System Structure and Parametrization) and FMI (Functional Mock-up Interface), which enables high-speed computation by multi-core distribution. We combine Bayesian optimization and co-simulation, then we build a collaborative development platform for hybrid power systems design. We report performance experiments using hybrid electric vehicle simulation model published by JAMBE (Japan Automotive Model-Based Engineering center).


Introduction
Hybrid power systems are a combination of power generation and energy storage systems (batteries and fuel cells).As shown in Figure 1, a hybrid power system consists of systems for both power supply and power demand stakeholders, and each stakeholder has its own methods and tools for system design and analysis.Hybrid electric vehicles (HEVs) also consist of various components such as engines, electric motors, DC-DC converters, DC-AC inverters, and batteries, and OEMs and suppliers work together for system design and analysis.
This means that system designing should consider different perspectives, which requires the use of several different tools and methodologies.However, connecting tools and exchanging information between different teams often leads to inefficiencies in system development.Since connecting tools from different vendors is not guaranteed to work and not supported by each vendor, it is necessary for the tool users to build and maintain their own environment for connecting tools.
For these reasons, interoperability standards for model exchanging between tools are important.FMI (Functional Mock-up Interface) and SSP (System Structure and Parametrization) are standardized by Modelica Association to establish interoperability between tools for model exchanging at various levels of abstraction.
SmartSE project of prostep ivip Association has published "SmartSE Recommendation" (SmartSE 2023) for simulation-based system design and decision making in collaborative development between multiple design teams across companies based on the utilization of FMI and SSP standard.

FMI (Functional Mock-up Interface)
FMI (Functional Mock-up Interface) is common interface and file format that allows simulation models to be passed between different tools.FMI standard is available on the official website (ref.FMI website).There are many tools over hundreds that support the FMI standard, and models can be exchanged between these tools.
FMU (Functional Mock-up Unit), which is zipped compressed file format, contains modelDescription.xmlfile in XML document format and a library file in binary format (DLL on Windows systems, SO on Linux systems) that implements the model's definition expressions or solver functions.
The format of modelDescription.xml is defined by the standard, and it contains information such as the names, types, and other attributes of the input and output signals of the model stored in the FMU, and a list of parameters that can be set and changed from outside the FMU.
There are two types of FMI standards "Model Exchange" interface and "Co-simulation" interface.Model Exchange FMU contains only the model equations.Co-simulation FMU includes both model equations and solver.
Model Exchange interface assumes that the model equations in multiple FMUs are aggregated and computed by a single solver.This makes it difficult to speed up the calculation by distributed parallelization, and it is difficult to guarantee consistency between the results of a single solver and those of distributed parallel computation.The reason is that each individual Co-simulation FMU contains a model and a solver, so each FMU can be computed independently.

SSP (System Structure and
Parametrization) Like FMU, SSP is a compressed file in zip format, and its interior consists of several sub-formats and its interior consists of several sub-formats.SSD (System Structure Description) is an XML file that describes the hierarchical structure, connection relationships, and functional structure of the entire FMU network.

Distributed co-simulation standards
In collaborative development through model exchange, there are many opportunities for large-scale simulation that combine model components created by multiple teams.Most commercial simulation tools calculate models sequentially, so when model parts are exchanged between companies or design teams using FMI and SSP, the simulation speed decreases as the number of FMUs for model parts increases.
Distributed co-simulation is expected to speed up largescale simulations by running models in parallel.Distributed co-simulation may be run in a multi-core distribution within a single machine, on a set of machines interconnected via a local area network, or on globally distributed computers communicating via the Internet.real-time mode that connects a real-time system (actual device) and a simulator, and non-real-time mode that is used to connect virtual simulators.In both modes, the role of controller (equivalent to RTI in HLA standard) is limited compared to HLA.For data communication, each simulator node communicates directly with the other simulator node on a point-to-point basis.DCP protocol configures mesh-type network for distributed simulation.In the real-time mode of DCP protocol, each node runs using its own timer, so there is no explicit synchronization of time between nodes.In the non-real-time mode, the controller sends clock signal to each node, and the simulator on each node runs explicit synchronized to the clock signal.

Contributions of this paper
In this paper, we design a multi-core distributed simulator which performs high speed co-simulations that connect many model parts (FMUs).Next, we combine Bayesian optimization and distributed co-simulation to create a toolset that can automatically perform parameter optimization of hybrid power systems design.

Computation of distributed cosimulation
This section considers distributed co-simulation for SSP and FMI that can perform parallel computation with multi-core distribution.
The upper part of Figure 2 shows the sequence of execution of the calculations of each simulator connected to the distributed simulation, the time synchronization for distributed simulation, and the data distribution.The lower part of Figure 2 shows the case of simulation by sequential computation without distributed computation as a comparison.
In the upper part of Figure 2, each of the three simulators computes independently and in parallel until the logical time set as the synchronization timing (coupling point), at which point the simulators exchange output values with the other simulators.After that, the three simulators again perform calculations in parallel and exchange output values with the other simulators at the next synchronization timing.This process is repeated.This figure shows a simple case in which the communication step size of the three simulators is the same and does not vary, but there may be other cases in which the synchronization period varies, or in which each simulator has a different synchronization period or synchronization timing.
In contrast, in the case of sequential calculations as in the lower part of Figure 2, which do not involve distributed calculations, the three calculations are repeated in sequence using a set of simulators.Most commercial simulation tools are considered to be sequential.
Comparing distributed and sequential computation cases, the distributed simulator is expected to increase simulation speed compared to the sequential case.However, the simulation speed does not increase linearly with the degree of parallelism.In the case of distributed computation, there is the overhead of synchronization and signal transmission, so it is important how lightly this process can be made and how much the computation time can be shortened.It should also be assumed that the computational complexity of each model component is not uniform, so even if distributed computation is used, there is a tendency for the speed to decrease when there is a model with a large computational complexity.
In Figure 2, all simulators synchronize at the same period, but in general, each simulator may synchronize at a different period.DSP described in section 1.3 is suitable for distributed co-simulations that synchronize at a single cycle, but not for distributed simulations that synchronize at different cycles, because it synchronizes by distributing clock signals from the controller node to the slaves.On the other hand, IEEE 1516 HLA is suitable for both singleperiod and multi-period synchronization because it schedules slave nodes by supervising the global time in the controller node.Therefore, this paper adopts the IEEE 1516 HLA mechanism.This mechanism corresponds to the RTI (Run-time Infrastructure) of the IEEE1516 HLA.The signal exchange service also reads SSP file and sets up the signal connection relationship between FMUs and parameters.
Since "FMI Executable," "Python," and "Distributed simulation controller" are independent processes, each process will be distributed across multiple CPU cores when run on a multi-core CPU machine.The number of processors and CPU core allocation can be changed using the processor affinity option in Microsoft Windows.Processor Affinity, also called CPU pinning, allows the user to assign a process to use only a few cores.
Inter-process communication between "FMI Executable," "Python," and "Distributed simulation controller" uses shared memory between processes on the same machine and TCP communication between processes on different machines.
Figure 4 shows the execution screen of the SSP-FMI simulator, with the "Distributed simulation controller" screen on the left and the simulation output signal time series on the right, plotted as a graph using Python's Matplotlib library (ref.Matplotlib website).

Performance evaluation using hybrid vehicle simulation
This section reports the performance evaluation of distributed SSP-FMI simulation by using series parallel hybrid electric vehicle (HEV) simulation model shown in  The original model was built in MathWorks Simulink.We divided Simulink model and exported in FMI format.The number of FMI files for the hybrid vehicle model is 21 files divided by the block units of BXXX in Figure 5, and the CVT vehicle model is 7 files divided by the block units of color marks in Figure 6.
Each simulation was performed with the input of 1800 second standard driving pattern of WLTC (Worldwide harmonized Light vehicles Test Cycles) class 3b developed by UNECE (United Nations Economic Commission for Europe).Both simulations were run with a sampling time of 2.5 millisecond which is the same as the original JAMBE model.
We measured the RTF (real-time factor) of simulation speed using two different machines.The first machine equips Intel Core i7-10870H processor with 8 cores and 16 threads.The second machine equips Intel Core i7-8700 processor with 6 cores and 12 threads.
We used Windows 10 as the OS.The SSP-FMI simulator we developed allows the user to select the number of CPU cores (threads) used in the calculation using the processor affinity option of Windows, and we compared the simulation speed when using a single CPU and when using multiple CPU cores (threads).
Figure 7 shows the measurement results on the machine with Intel Core i7-10870H processor.The vertical axis represents RTF.The horizontal axis is the number of CPU cores (threads) used.The upper measurement is for a CVT vehicle simulation with 7 FMUs, and the lower measurement is for a hybrid vehicle simulation with 21 FMUs.Figure 8 shows the measurement results using the machine with Intel Core i7-8700 processor, and the notation is the same as in Figure 7.
The measurement results show that increasing the number of CPUs used in a distributed calculation can increase the speed by up to a factor of two compared to a calculation using a single CPU.However, it was also found that speed increases only up to 8 CPU cores (threads) and that speed tends to decrease slightly when more than 8 CPU cores (threads) are used.This is thought to be because excessive use of processor CPU cores (threads) affects the execution of non-simulation processes and has the opposite effect on the performance of distributed computation.
Table 1 compares RTF of each individual component FMU in a hybrid vehicle simulation using 21 FMUs with 8 threads on a Core i7-10870H, measuring the execution time that "FMI Executable" was running.It can be seen that the simulation speed varies by a factor of several depending on the complexity of the model included in the FMU and the amount of calculation.The overall simulation RTF is 21.785 while the RTF of the single FMU of HV_CNT (hybrid control controller) is close to this at 26.651, indicating that the calculation of HV_CNT is the overall speed-determining factor.This shows that the overall simulation speed tends to be dragged down by the computationally intensive FMU, and that no further speed-up can be expected in the hybrid vehicle simulation even if the number of threads is increased to 8 or more.
By measuring the overall RTF and RTFs of each individual FMU in this manner, it is believed that it is possible to determine the optimal number of CPU cores (threads) that will provide the maximum simulation speed in the parallel distributed SSP-FMI simulator.
4 System parameter optimization using SSP-FMI co-simulation

Framework of system parameter optimizetion
This chapter describes an application of SSP-FMI cosimulation to system parameter optimization.
Collaborative and rapid development of hybrid power supply systems often requires various configuration and many control parameters to be optimized.It also requires to facilitate model exchange between partners while keeping confidentiality of model.We think distributed cosimulation utilizing model interoperability standard FMI and SSP and Bayesian optimization will be solve the problems.
Figure 9 illustrates the framework of collaborative development platform for hybrid power suppy systems.In this framework, model parts are collected from partners in FMI format and optimum parameter set can be searched by Bayesian optimizer and distributed co-simulation.

Optimization set-up
A flowchart for optimization functionality is shown in Figure 10, where an initial value is first generated (either randomly or by user input).The SSP file for the cosimulation is then modified, where the values of the parameters to be optimized over are changed.This allows the co-simulation to be run for the chosen parameter values.
Once the co-simulation has finished, the cost function to be optimized can be extracted from the output.Based on the parameter and output values, an optimization module can determine the next point to evaluate.

Optimization algorithms
Compared to most optimization problems, the cost function in co-simulation is often time-and resourceconsuming.The system may comprise a large number of subsystems and even with parallel and distributed simulation, it can take a long time to evaluate its performance.Furthermore, the inner workings of the subsystems are often not known to the co-simulation master as they may originate from different vendors or developers.Hence, we can treat the cost function as a black box which is expensive to evaluate.
For this kind of optimization problem, a suitable algorithm is Bayesian optimization (Brochu09).This works by placing a Gaussian prior on the function and updating the posterior distribution based on the observed input and output values.This can be used to compute the best next value to evaluate, where criteria such as "probability of improvement" and "expected improvement" can be used.The advantage of Bayesian optimization is that it requires few evaluations of the cost function, as opposed to, e.g., evolutionary methods (NSGA-II, genetic algorithms, particle swarms, etc.) which need many evaluations (Emmerich18).
It should be pointed out that it is possible to consider more than one cost function.Multi objective optimization is common in large and complex systems, where there are many metrics by which a system performance can be measured in.These are often conflicting, and an optimal trade-off is sought.Bayesian optimization can be extended to multi objective optimization, e.g., with efficient algorithms such as TSEMO (Bradford18).

Optimization example
As an example of system parameter optimization, we considered the JAMBE HEV model.In particular we considered the role the clutch thresholds play in the propulsion.As shown below, the clutch helps activate electric only (Figure 11a) or hybrid-electric assist (Figure 11b).This is determined by, among other things, a threshold to open and a threshold to close it (measured in vehicle speed, km/h).These two thresholds were chosen as the system parameters to optimize over.The cost function was set as the fuel consumption during the WLTC class3b test drive cycle shown in Figure 12.

Optimization results
Using the MATLAB toolbox, Bayesian optimization of the clutch thresholds was implemented.Note that this allows the Optimization module in Figure 10 to output a suggested next value to evaluate in the next co-simulation, which was based on the "expected improvement" criterion.The cost function was extracted from the cosimulations as the fuel consumption after the 1800 seconds drive cycle.This is measured in km/l, so we are looking for its maximum.
The results are shown in Figure 13, where the fuel consumption (in km/l) is plotted against the two clutch thresholds parameter which defined in the JAMBE HEV model.
HV_CNT_Clutch_ON_threshold_vel_kmph means that closing clutch is possible above this speed.HV_CNT_Clutch_OFF_threshold_vel_kmph means that opening clutch is prohibited below this speed.
To appreciate the optimization result, we also performed an exhaustive evaluation for all feasible parameter values.This involved around 400 cosimulations, whereas the optimization approach only used 10.Compared with the default parameter values which defined in the JAMBE HEV model (red circle), the optimized values (blue circle) showed a 2% improvement in fuel consumption.Although this is quite small, more gains can be had by considering other blocks, possibly in combination with each other.

Conclusion
FMI and SSP standards establish model exchange at various levels of abstraction and interoperability between tools.In this paper, we investigated a configuration of SSP-FMI simulator that enables parallel computation by multi-core distribution.We also examined application of SSP-FMI simulator to the system parameter optimization.We are planning to apply collaborative development platform to the development of electric vehicles (integration of batteries, BMS, power trains, vehicle dynamics, etc), hybrid electric aircrafts (hydrogen fuel cells with batteries and high-performance electric motors) and offshore wind turbines (optimize efficiency and cost by wind & wave prediction).
We also plan to support co-simulation interface of FMI3.0 standard, whose official specification will be issued in 2022, and the newly introduced Scheduled Execution (SE) interface with the distributed parallel simulation in this paper.

Figure 1 .
Figure 1.DX in Hybrid Power Supply Systems Design.

3
Distributed SSP-FMI co-simulation 3.1 Architecture of distributed simulator This section shows the architecture of SSP-FMI simulator with distributed computation shown in Figure 3. SSP-FMI simulator was developed using distributed co-simulation platform VenetDCP from Toshiba Digital Solutions (ref.VeneDCP website)."FMI Executable" loads and executes Co-simulation 2.0 interface FMU file.FMI standard defines the function APIs used to initialize the FMU and execute model computation.These functions are stored in DLL binary library file in the FMU."FMI Executable" unzips FMU file, obtains information of input/output signals and parameters from modelDescription.xml file, and calls the function APIs in DLL binary library file to drive and simulate the imported FMU.In co-simulation where multiple FMUs are running, multiple FMI executables are launched for individual FMUs to achieve parallel distributed computation.Recording and monitoring of the test data time series signal input and output signal time series are performed using Python.

Figure 2 .
Figure 2. Comparison of computation between distributed co-simulation and sequential co-simulation.

Figure 11 .
Figure 11.Scenario for the considered JAMBE model.

Figure 12 .
Figure 12.WLTC Class 3b test driving cycle consisting of Low, Medium, High and Extra High phases.

Figure 9 .
Figure 9. Framework of collaborative development platform.

Figure 13 .
Figure 13.Optimization results for the co-simulated JAMBE model.The red and blue circles represent the default and optimized operational values, respectively.

Table 1 .
Comparison of RTF for each FMI model.(Series parallel hybrid vehicle)