Lecture 19: Least squares fitting

Lecture 19: Least squares fitting#

Motivation#

Suppose we are given data point representing a quantity over time.
We want to represent the graph of this data as a simple curve.
The examples on the following slide show the fitting of two different curves through some experimental data:
- a straight line;
- a quadratic curve.

Weekly earnings#

Raw data for weekly earnings:

../_images/7477ec1402a9ab5d110fdf843db03b87f88e58be996df759b5cba2b043ec337c.png

Raw data for weekly earning with a straight line of best fit:

../_images/2b652aca43ce35e3176212f9bd0b30ab9b1d4e0912a248869dd343604ca46688.png

Raw data for weekly earnings with a quadric curve of best fit:

../_images/564b771d0a1f493b22be8943903ea168de9cbb2cd58cea180c0e8aac0adc68df.png

Data source, ons, [xls]

An example of best linear fit#

Suppose that the following measured data, \(y\), is observed at different times \(t\):

\[\begin{split} \begin{array}{c|cccc} t & 1 & 2 & 3 & 4 \\ \hline y & 1 & 1.5 & 2.5 & 3.5 \end{array} \end{split}\]

../_images/e827fe12b71547fb5cfc72e7051e8a76cbe0b41383fdd710a3c8db53aa8aa97f.png

Consider representing this data as a straight line:

\[ y = mt + c \]
An exact fit would require the following equations to be satisfied:

\[\begin{split} \begin{aligned} m \times 1 + c & = 1 \\ m \times 2 + c & = 1.5 \\ m \times 3 + c & = 2.5 \\ m \times 4 + c & = 3.5. \end{aligned} \end{split}\]

We recognise this as a system of linear equations for \((m, c)\) - but there are too many equations!!

An example of best quadratic fit#

Suppose that the following measured data, \(y\), is observed at different times \(t\):

\[\begin{split} \begin{array}{c|ccccc} t & -1.0 & -0.5 & 0 & 0.5 & 1.0 \\ \hline y & 1.0 & 0.5 & 0.0 & 0.5 & 2.0 \end{array} \end{split}\]

../_images/a674caf8767f3e076c00589b8380f1b44e84607411f86a661c361d6bd50ac46b.png

Consider representing this data as a quadratic line:

\[ y = a + b t + c t^2 \]
An exact fit would require the following equations to be satisfied:

\[\begin{split} \begin{aligned} a + b \times -1 + c \times (-1)^2 & = 1 \\ a + b \times -0.5 + c \times (-0.5)^2 & = 0.5 \\ a + b \times 0 + c \times 0^2 & = 0 \\ a + b \times 0.5 + c \times 0.5^2 & = 0.5 \\ a + b \times 1 + c \times 1^2 & = 2. \end{aligned} \end{split}\]

We recognise this as a system of linear equations for \((a, b, c)\) - but there are too many equations!!

Best approximation#

Given that there is no exact fit solution to these overdetermined systems of equations what should we do?
Recall the definition of the residual for the system \(A \vec{x} = \vec{b}\):

\[ \vec{r} = \vec{b} - A \vec{x}, \]
- When \(\vec{r} = \vec{0}\), then \(\vec{x} = \vec{x}^*\) the exact solution.
- When there is no exact solution the next best thing is to make \(\vec{r}\) as small as possible.
- This means finding \(\vec{x}\) that minimises \(\| \vec{r} \|^2\).

The normal equations#

It turns out that the \(\vec{x}\) that minimises \(\| \vec{b} - A \vec{x} \|^2\) is the same \(\vec{x}\) that satisfies that following square systems of equations:

\[ A^T A \vec{x} = A^T \vec{b}. \]

These equations are referred to as the normal equations for the overdetermined system \(A \vec{x} = \vec{b}\).
The square matrix \(A^T A\) is generally non-singular (i.e., the solution is unique).
You can find this solution using Gaussian elimination (for example).
The normal equations, when solved, give the best solution to the original problem in the sense of minimising the Euclidean norm of the residual.

Example I#

Find the least squares approximation to the quadratic fit example.

The residual is given by

\[\begin{split} \begin{aligned} \vec{r} & = \vec{b} - A \vec{x} \\ & = \begin{pmatrix} 1 \\ 0.5 \\ 0 \\ 0.5 \\ 2 \end{pmatrix} - \begin{pmatrix} 1 & -1 & 1 \\ 1 & -0.5 & 0.25 \\ 1 & 0 & 0 \\ 1 & 0.5 & 0.25 \\ 1 & 1 & 1 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} \end{aligned} \end{split}\]

We want to find \(\vec{x}\) that minimises

\[ \| \vec{r} \|^2 = \| \vec{b} - A \vec{x} \|^2. \]

The left and right hand sides of the normal equations are given by

\[\begin{split} \begin{aligned} A^T A & = \begin{pmatrix} 1 & 1 & 1 & 1 & 1 \\ -1 & -0.5 & 0 & 0.5 & 1 \\ 1 & 0.25 & 0 & 0.25 & 1 \end{pmatrix} \begin{pmatrix} 1 & -1 & 1 \\ 1 & -0.5 & 0.25 \\ 1 & 0 & 0 \\ 1 & 0.5 & 0.25 \\ 1 & 1 & 1 \end{pmatrix} \\ & = \begin{pmatrix} 5 & 0 & 2.5 \\ 0 & 2.5 & 0 \\ 2.5 & 0 & 2.125 \end{pmatrix} \end{aligned} \end{split}\]

and

\[\begin{split} A^T \vec{b} = \begin{pmatrix} 1 & 1 & 1 & 1 & 1 \\ -1 & -0.5 & 0 & 0.5 & 1 \\ 1 & 0.25 & 0 & 0.25 & 1 \end{pmatrix} \begin{pmatrix} 1 \\ 0.5 \\ 0 \\ 0.5 \\ 2 \end{pmatrix} = \begin{pmatrix} 4 \\ 1 \\ 3.25 \end{pmatrix} \end{split}\]

We can apply Gaussian elimination to solve this problem (only one row operation is required!). The algorithm gives:

\[\begin{split} \begin{pmatrix} 5 & 0 & 2.5 \\ 0 & 2.5 & 0 \\ 0 & 0 & 0.875 \\ \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} = \begin{pmatrix} 4 \\ 1 \\ 1.25 \end{pmatrix}, \end{split}\]

and backward substitution gives (to 4 significant figures)

\[ \vec{x} = (0.08571, 0.4000, 1.429). \]

../_images/6719d6bec77da993369a6750a2f00a775e98608d56577d51fd1a758e70682f7e.png

Example 2 (homework)#

Find the least square approximation to the system given by

\[\begin{split} \begin{pmatrix} -2 & 2 & -1 \\ 0 & 1 & 0 \\ 3 & 1 & -2 \\ 1 & 1 & 4 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} = \begin{pmatrix} 2 \\ 1 \\ 0 \\ -3 \end{pmatrix} \end{split}\]

Problems with the normal equations#

Consider the matrix \(A = \begin{pmatrix} 1 & 1 \\ \varepsilon & 0 \\ 0 & \varepsilon \end{pmatrix}\), which gives

\[\begin{split} A^T A = \begin{pmatrix} 1 + \varepsilon^2 & 1 \\ 1 & 1 + \varepsilon^2. \end{pmatrix} \end{split}\]

If \(\varepsilon \approx \sqrt{eps}\) then the effects of rounding error can make \(A^T A\) appear to be singular.

See also: Nick Higham, Seven sins of numerical linear algebra

Sensitivity and conditioning#

The condition number of a square matrix \(A\) describes how close that matrix is to being singular.
If the condition number is larger then \(A\) is “close” to being singular.
When the condition number is very large it is likely that the effects of rounding errors will be most serious: we refer to such a system as being ill-conditioned.
The normal equations are typically quite ill-conditioned and so it is essential to use high precision arithmetic to solve them and specialised algorithms.

Summary#

Overdetermined systems are common in data modelling and they typically have no solution.
Even so, it is possible to find a closest solution to the problem.
It is common to measure this closeness in the Euclidean norm and this leads naturally to the least squares approximation.
A solution to this problem can be found by solving the normal equations but can must be taken to use arithmetic with sufficient precision and specialised algorithms.