# Strategies to Minimize Data Sample Size for Regression-Based Pump/Motor Models

## DOI:

https://doi.org/10.3384/ecp182p134## Keywords:

RMS of Residuals, Progressively Sequenced Regression Analysis, Latin Hypercube sampling, minimum sample size, vertex mining, vertex sequencing, hyperspace vertexes, convergence plateau, pump flow model## Abstract

This work presents an analysis for tracking the evolution of regression coefficients and the Root-Mean-Square of their residuals on a test dataset for a hydraulic pump. The method starts by iteratively regressing data points that are undergoing sequencing by adding one new data sample at a time, then regressing with each iteration. This process was named Progressively Sequenced Regression Analysis, shortened to “PSR analysis” in this paper. The motivating and guiding postulate of PSR analysis is based on the belief that a plateau of the regression coefficients and statistical figures of merit had to exist if sampling theory is accepted to be real. It was anticipated at the outset that both the regression coefficients and the Residual RMS would converge on respective plateau values; however, it was discovered that the coefficients were very volatile, with some, more volatile than others. Tracking the Residual RMS was found to produce the more reliable measure of information saturation because the convergence is more obvious, provided that the sample sequencing was done with the experience learned from performing PSR analysis. This document is focused on explaining how orthogonally sequenced data can be mined for the limits or hyperspace vertexes of the sampled data, and the source data optimally sequenced (rearranged) to produce results that are as efficacious as Latin Hypercube (LHC) sampling for achieving information saturation at a predictable number of samples. PSR analysis has led to an objective method for verifying that the proper arrangement, i.e., optimized sequencing, of the source data set can predict the condition of information saturation and minimum useful sample size. It ends with a postulate of how this can be achieved using a combination of LHC sampling and vertex pre-test planning, or vertex mining of legacy data. The content of this paper has concentrated solely on the output flow model of hydraulic, positive displacement pumps.