Confidence Intervals in CFTool

General assumptions

CFTool uses one of three non-linear regression algorithms to compute the best fit of a given function to the data provided. Like most fitting methods, the uncertainty estimates only make sense if some general assumptions are reasonably true:

  • The sample is reasonably representative of the population.
  • The chosen fitting function is a sensible description of the data.
  • The residuals are not strongly correlated with one another.
  • The spread of the residuals is reasonably consistent across the fitted range.
  • The independent variables are measured with negligible error compared with the dependent variables.*

* Except in the case of orthogonal distance regression, where errors in the independent variables are also taken into account.

Where the coefficient errors come from

If these assumptions are reasonable, the fitting algorithm returns a covariance matrix for the fitted coefficients. The square root of each diagonal element gives the standard error of the corresponding coefficient. These standard errors are then used to construct confidence intervals.

So the quoted errors on the fitted coefficients do not come from the spread of repeated fits, but from the covariance matrix returned by the fitting procedure itself.

The Student’s t correction

CFTool does not stop at the raw standard errors. It applies a correction using Student’s t distribution. This depends on the residual degrees of freedom, which is the number of fitted data points minus the number of fitted coefficients. For example, fitting y=ax+b to three data points leaves one degree of freedom, whereas fitting y=ax leaves two.

This matters because when there are only a few excess degrees of freedom, the uncertainty on the coefficients is greater than would be suggested by a simple large-sample Gaussian approximation. The Student’s t correction allows for this by giving a larger multiplier, and therefore wider confidence intervals. As the number of degrees of freedom increases, the Student’s t distribution approaches the ordinary normal distribution, so the difference becomes small.

This is the method that CFTool employs. In practical terms, CFTool estimates the coefficient errors from the covariance matrix returned by the fit, and then applies a Student’s t correction based on the residual degrees of freedom. For small data sets this usually gives a more cautious and more realistic estimate of coefficient uncertainty than using a pure Gaussian approximation alone.

Important limitations

These confidence intervals should still be regarded as approximate, especially for non-linear fitting, because they are based on the local covariance estimate of the fitted model. If the function is poorly constrained, or if the model is not a good description of the data, the reported coefficient errors may still be misleading.

In weighted fitting, the interpretation also depends on whether the supplied sigma values are being treated as relative weights or as absolute measurement errors. If absolute sigma is selected, the supplied errors set the scale of the covariance estimate directly. If relative sigma is selected, the covariance is scaled using the scatter of the residuals.

Very small data sets

CFTool will still plot a result for a fit with only a very small number of excess degrees of freedom, but it will warn you that the uncertainty and quality-of-fit values may not be statistically significant. This is not a mistake in the programme, it is a consequence of asking for interval estimates from a fit with very limited statistical support.