Lecture 19: Least squares fitting#

… time to starting thinking about ~~Christmas~~ ~~data~~ coursework! :(

Module feedback: https://leeds.bluera.com/leeds

../_images/dd30d23172d7662f0d1e7f6697e8db9486372acd688be91f3304aca9ea9db020.png

Motivation#

  • Suppose we are given data point representing a quantity over time.

  • We want to represent the graph of this data as a simple curve.

  • The examples on the following slide show the fitting of two different curves through some experimental data:

    • a straight line;

    • a quadratic curve.

Weekly earnings#

Raw data for average UK weekly earnings since 2008 (from ONS)

../_images/de71c41c2b4f45e2a83a572fa5d674ba2c7019ca0961eec49490c2be542d6f0b.png

Raw data for weekly earning with a straight line of best fit:

../_images/eba2ac4f732a609c93a86e6abeba98efb8aa7e172c880517102453e8391b000b.png

Raw data for weekly earnings with a quadric curve of best fit:

../_images/c51b3c8451b496a633ae423c9e1ca81199772ef86c9f27114a2dd2b6a2dc0c05.png

…. how do we do this….

An example of best linear fit#

Suppose that the following measured data, \(y\), is observed at different times \(t\):

\[\begin{split} \begin{array}{c|cccc} t & 1 & 2 & 3 & 4 \\ \hline y & 1 & 1.5 & 2.5 & 3.5 \end{array} \end{split}\]
../_images/c1d8547d088336e0a9ddb9ced6aab8e6a1a61f68c91697184c394f47b00a4b1d.png
\[\begin{split} \begin{array}{c|cccc} t & 1 & 2 & 3 & 4 \\ \hline y & 1 & 1.5 & 2.5 & 3.5 \end{array} \end{split}\]
  • Suppose we want to represent the data using a straight line: \(y = m t + c\)

  • An exact fit would require the following equations to be satisfied:

\[\begin{split} \begin{aligned} m \times 1 + c & = 1 \\ m \times 2 + c & = 1.5 \\ m \times 3 + c & = 2.5 \\ m \times 4 + c & = 3.5. \end{aligned} \end{split}\]

This is a system of linear equations for \((m , c)\) but there are too many equations!

An example of best quadratic fit#

Suppose that the following measured data, \(y\), is observed at different times \(t\):

\[\begin{split} \begin{array}{c|ccccc} t & -1.0 & -0.5 & 0 & 0.5 & 1.0 \\ \hline y & 1.0 & 0.5 & 0.0 & 0.5 & 2.0 \end{array} \end{split}\]
../_images/31d16ac656c6e43cd93b61e2536059cef64f6b5012801fc5fd17662cc5fbd69a.png
\[\begin{split} \begin{array}{c|ccccc} t & -1.0 & -0.5 & 0 & 0.5 & 1.0 \\ \hline y & 1.0 & 0.5 & 0.0 & 0.5 & 2.0 \end{array} \end{split}\]
  • Consider representing this data as a quadratic line:

    \[ y = a + b t + c t^2 \]
  • An exact fit would require the following equations to be satisfied:

    \[\begin{split} \begin{aligned} a + b \times -1 + c \times (-1)^2 & = 1 \\ a + b \times -0.5 + c \times (-0.5)^2 & = 0.5 \\ a + b \times 0 + c \times 0^2 & = 0 \\ a + b \times 0.5 + c \times 0.5^2 & = 0.5 \\ a + b \times 1 + c \times 1^2 & = 2. \end{aligned} \end{split}\]

We recognise this as a system of linear equations for \((a, b, c)\) - but there are too many equations!!

Best approximation#

Given that there is no exact fit solution to these overdetermined systems of equations what should we do?

What can we do?#

Recall the definition of the residual for the system \(A \vec{x} = \vec{b}\):

\[ \vec{r} = \vec{b} - A \vec{x}, \]
  • When \(\vec{r} = \vec{0}\), then \(\vec{x} = \vec{x}^*\) the exact solution.

  • When there is no exact solution the next best thing is to make \(\vec{r}\) as small as possible.

  • This means finding \(\vec{x}\) that minimises \(\| \vec{r} \|^2\).

The normal equations#

It turns out that the \(\vec{x}\) that minimises \(\|\vec{b} - A \vec{x}\|^2\) is the same \(\vec{x}\) that satisfies the following square system of equations:

\[ A^T A \vec{x} = A^T \vec{b} \]

The normal equations#

  • These equations are referred to as the normal equations for the overdetermined system \(A \vec{x} = \vec{b}\).

  • The square matrix \(A^T A\) is generally non-singular (i.e., the solution is unique).

  • You can find this solution using Gaussian elimination (for example).

  • The normal equations, when solved, give the best solution to the original problem in the sense of minimising the Euclidean norm of the residual.

Example 1#

Find the least squares approximation to the match the data:

\[\begin{split} \begin{array}{c|ccccc} t & -1.0 & -0.5 & 0 & 0.5 & 1.0 \\ \hline y & 1.0 & 0.5 & 0.0 & 0.5 & 2.0 \end{array} \end{split}\]
../_images/748d8b97c8d29c0db01ba3639ca91cd3a26d13c160d8ac48df4f1da1bb8e57a9.png

Example 2 (for you!)#

Find the least square approximation to the system given by

\[\begin{split} \begin{pmatrix} -2 & 2 & -1 \\ 0 & 1 & 0 \\ 3 & 1 & -2 \\ 1 & 1 & 4 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} = \begin{pmatrix} 2 \\ 1 \\ 0 \\ -3 \end{pmatrix} \end{split}\]
A = np.array([[-2.0, 2.0, -1.0], [0.0, 1.0, 0.0], [3.0, 1.0, -2.0], [1.0, 1.0, 4.0]])
b = np.array([[2.0], [1.0], [0.0], [-3.0]])

x = np.linalg.solve(A.T @ A, A.T @ b)
x
array([[-0.5       ],
       [ 0.28571429],
       [-0.66666667]])

Problems with the normal equations#

Consider the matrix \(A = \begin{pmatrix} 1 & 1 \\ \varepsilon & 0 \\ 0 & \varepsilon \end{pmatrix}\), which gives

\[\begin{split} A^T A = \begin{pmatrix} 1 + \varepsilon^2 & 1 \\ 1 & 1 + \varepsilon^2. \end{pmatrix} \end{split}\]

If \(\varepsilon \approx \sqrt{eps}\) then the effects of rounding error can make \(A^T A\) appear to be singular.

See also: Nick Higham, Seven sins of numerical linear algebra

Sensitivity and conditioning#

  • The condition number of a square matrix \(A\) describes how close that matrix is to being singular.

  • If the condition number is larger then \(A\) is “close” to being singular.

  • When the condition number is very large it is likely that the effects of rounding errors will be most serious: we refer to such a system as being ill-conditioned.

  • The normal equations are typically quite ill-conditioned and so it is essential to use high precision arithmetic to solve them and specialised algorithms.

Summary#

  • Overdetermined systems are common in data modelling and they typically have no solution.

  • Even so, it is possible to find a closest solution to the problem.

  • It is common to measure this closeness in the Euclidean norm and this leads naturally to the least squares approximation.

  • A solution to this problem can be found by solving the normal equations but can must be taken to use arithmetic with sufficient precision and specialised algorithms.