Enhancing Indoor Temperature Forecasting through Synthetic Data in Low-Data Regime


  • Massimiliano Ruocco
  • Zachari Thiry
  • Alessandro Nocente
  • Michail Spitieris




Forecasting indoor temperatures is of paramount importance to achieve efficient control of HVAC systems. In this task, the limited data availability presents a challenge as most of the available data is acquired during standard operation where extreme scenarios and transitory regimes such as major temperature increases or decreases are de-facto excluded. Acquisition of such data requires significant energy consumption and a dedicated facility, hindering the quantity and diversity of available data. To acquire such data, we make use of such a facility referred to as the Test-cell. Cost related constraints however do not allow for continuous year-around acquisition.To address this, we investigate the efficacy of data augmentation techniques, particularly leveraging state-of-the-art AI-based methods for synthetic data generation. Inspired by practical and experimental motivations, we explore fusion strategies of real and synthetic data to improve forecasting models. This approach alleviates the need for continuously acquiring extensive time series data, especially in contexts involving repetitive heating and cooling cycles in buildings. Our evaluation methodology for synthetic data synthesis involves a dual-focused approach: firstly, we assess the performance of synthetic data generators independently, particularly focusing on SoTA AI-based methods; secondly, we measure the utility of incorporating synthetically augmented data in a subsequent downstream tasks (forecasting). In the forecasting tasks, we employ a simple model in two distinct scenarios: 1) we first examine an augmentation technique that combines real and synthetically generated data to expand the training dataset, 2) Second, we delve into utilizing synthetic data to tackle dataset imbalances. Our results highlight the potential of synthetic data augmentation in enhancing forecasting accuracy while mitigating training variance. Through empirical experiments, we show significant improvements achievable by integrating synthetic data, thereby paving the way for more robust forecasting models in low-data regime.