Weight Rescaling: Applying Initialization Strategies During Training


  • Lukas Niehaus
  • Ulf Krumnack
  • Gunther Heidemann




The training success of deep learning is known to depend on the initial statistics of neural network parameters. Various strategies have been developed to determine suitable mean and standard deviation for weight distributions based on network architecture. However, during training, weights often diverge from their initial scale. This paper introduces the novel concept of weight rescaling, which enforces weights to remain within their initial regime throughout the training process. It is demonstrated that weight rescaling serves as an effective regularization method, reducing overfitting and stabilizing training while improving neural network performance. The approach rescales weight vector magnitudes to match the initialization methods’ conditions without altering their direction. It exhibits minimal memory usage, is lightweight on computational resources and demonstrates comparable results to weight decay, but without introducing additional hyperparameters as it leverages architectural information. Empirical testing shows improved performance across various architectures, even when combined with additional regularization methods like dropout in AlexNet and batch normalization in ResNet-50. The effectiveness of weight rescaling is further supported by a thorough statistical evaluation.