Assessing Curve Fit Quality

WinWCP V5.3.8

Curve Fitting > Assessing Curve Fit Quality

 

Iterative curve fitting is a numerical approximation technique, which is not without its limitations. In some circumstances, it can fail to converge to a meaningful answer, in others the best fit parameters may be poorly defined. It is important to make an assessment of how well the function fits the curve before placing too much reliance on the parameters.

Does the chosen function provide a good fit to the data?

One assessment of the goodness of fit is to compare the variance of the residual differences between the best fit function and the data with the background variance of the signal. If the function provides a poor fit to the data, the residual variance will be significantly greater than the variance of the random background noise on the signal. The distribution of the variance as displayed in the residuals plot is also important. Deviations should be randomly distributed over the fitted region of the record. If the fitted line is consistently higher than the data points in some parts and lower in others this indicates that the signal is not well represented by the chosen equation.

Are the parameters well-defined?

The aim of most curve fitting exercises is to obtain a well-defined set of function parameters (e.g. exponential time constants) which characterise the part of the signal being fitted. The standard errors of the best fit parameters provide an indication of this. A large standard error indicates that a parameter is poorly defined by the data and can be varied significantly with little effect on the goodness of fit. Such a situation typically arises when there is insufficient information contained in the signal waveform to adequately define the function. For instance, in the case of exponential functions, the waveform data must be of sufficient duration to contain at least one time constant of the exponential function before an accurate estimate can be obtained. Similarly, it proves difficult to accurately estimate the time constants of multiple exponential functions when they differ by less than a factor of 5.

It is worth noting that the parameter "standard errors" discussed above are computed from the Hessian matrix by the curve fitting program, and are not true estimates of experimental standard error since they take no account of inter cell or other variability. In addition, they only provide a lower bound to the estimate of the standard error in parameter value. It can be shown (by simulation) that, if the random noise on the experimental signals is correlated, then the variability of fitted parameters may be substantially greater than suggested by the computed parameter standard error. The error in parameter estimation can be a complex function of the parameter values and the signal-noise ratio of the data. It is therefore wise to test the curve fitting procedure using simulated waveforms with known parameters set spanning the range of values likely to be observed in the experimental data.

Are all the parameters meaningful?

It is also necessary to discriminate between functions, which fit the data equally well. For instance, the question often arises as to whether one, two, or more, exponential functions are needed to fit a signal waveform. It is usually obvious from the residual plot when a single exponential does NOT provide a good fit. However, when a single exponential does fit, two or more exponentials will also provide a good fit. In such circumstance, it is usual to choose the function with the least number of parameters, on the principle of parsimony. An excess of function parameters also results in the some of the parameters being ill-defined with standard errors values often larger than the parameter values themselves.

A more detailed discussion of the above issues can be found in Dempster (1992) and (2001).