Selecting the best fitting
As you can see from the results above, you can fit any polynomial to a set of
data. The question arises, which is the best fitting for the data? To help one
decide on the best fitting we can use several criteria:
The correlation coefficient, r. This value is constrained to the range –
1 < r < 1. The closer r is to +1 or –1, the better the data fitting.
The sum of squared errors, SSE. This is the quantity that is to be
minimized by least-square approach.
A plot of residuals. This is a plot of the error corresponding to each
of the original data points. If these errors are completely random, the
residuals plot should show no particular trend.
Before attempting to program these criteria, we present some definitions:
Given the vectors x and y of data to be fit to the polynomial equation, we
form the matrix X and use it to calculate a vector of polynomial coefficients b.
We can calculate a vector of fitted data, y', by using y' = X⋅b.
An error vector is calculated by e = y – y'.
The sum of square errors is equal to the square of the magnitude of the error
vector, i.e., SSE = |e|
To calculate the correlation coefficient we need to calculate first what is
known as the sum of squared totals, SST, defined as SST = Σ (y
is the mean value of the original y values, i.e., y = (Σy
In terms of SSE and SST, the correlation coefficient is defined by
Here is the new program including calculation of SSE and r (Once more,
consult the last page of this chapter to see how to produce the variable and
command names in the program):
e = Σ e
= Σ (y
r = [1-(SSE/SST)]
, where y