Lecture 09: Iterative solutions of linear equations

Lecture 09: Iterative solutions of linear equations#

Direct methods#

Gaussian elimination and LU factorisation are characterised as direct methods.
They are guaranteed to produce a solution with a fixed amount of work (in exact arithmetic).
This fixed amount of work may be very large.

For a general \(n \times n\) matrix system \(A \vec{x} = \vec{b}\):

the computational expense of all the direct methods is \(O(n^3)\), i.e. increasing \(n\) by a factor of 10 increases the computational work by a factor of 1000;
the amount of storage required for these approaches is \(O(n^2)\), and is dominated by the matrix \(A\);
as \(n\) becomes large the storage and computational work required limit the practicality of the method.

Iterative methods#

It is possible to construct an iteration which will improve an approximation \(\vec{x}^{(k)}\) to the solution of (linear) equations \(A \vec{x} = \vec{b}\).

The iteration takes the form

\[ \vec{x}^{(k+1)} = \vec{F}(\vec{x}^{(k)}) \]

where \(\vec{x}^{(k)}\) is now a vector of values and \(\vec{F}\) is some vector function, yet to be defined.
Often a reasonable approximation to \(\vec{x}\) is available which can be used to initiate the iterative method.
Sometimes there is no obvious initial approximation available. The method may still converge but take more iterations.
When should we stop?

Some very bad examples#

Example 1

Consider

\[ \vec{F}(\vec{x}^{(k)}) = \vec{x}^{(k)}. \]

Each iteration is very cheap to compute but very inaccurate - it never converges!

Example 2

Consider

\[ \vec{F}(\vec{x}^{(k)}) = \vec{x}^{(k)} + A^{-1} (\vec{b} - A \vec{x}^{(k)}). \]

Each iteration is very expensive to compute - you have to invert \(A\)! - but it converges in just one step since

\[\begin{split} \begin{aligned} A \vec{x}^{(k+1)} & = A \vec{x}^{(k)} + A A^{-1} (\vec{b} - A \vec{x}^{(k)}) \\ & = A \vec{x}^{(k)} + \vec{b} - A \vec{x}^{(k)} \\ & = \vec{b}. \end{aligned} \end{split}\]

Key idea

Construct iteration given by

\[ \vec{F}(\vec{x}^{(k)}) = \vec{x}^{(k)} + P (\vec{b} - A \vec{x}^{(k)}). \]

for some matrix \(P\) such that

\(P\) is easy to compute, or the matrix vector product \(P \vec{r}\) is easy to compute,
\(P\) approximates \(A^{-1}\) well enough that the algorithm converges in few iterations.

Jacobi iteration#

Consider the system \(A \vec{x} = \vec{b}\) but rewrite the matrix \(A\) as

\[ A = D + (A - D), \]

where \(D\) is the diagonal of \(A\), i.e.

\[ D_{ii} = A_{ii} \quad \text{ and } \quad D_{ij} = 0 \text{ for } i \neq j. \]

Manipulations give

\[\begin{split} \begin{aligned} A \vec{x} = (D + (A-D)) \vec{x} & = \vec{b} \\ D \vec{x} & = \vec{b} - (A-D) \vec{x} \\ D \vec{x} & = D \vec{x} + (\vec{b} - A \vec{x}) \\ \vec{x} = \vec{x} + D^{-1} (\vec{b} - A \vec{x}). \end{aligned} \end{split}\]

That is take \(P = D^{-1}\)!

The Jacobi iteration is given by

\[ \vec{x}^{(k+1)} = \vec{x}^{(k)} + D^{-1}(\vec{b} - A \vec{x}^{(k)}) \]

\(D\) is a diagonal matrix, so \(D^{-1}\) is trivial to form (as long as the diagonal entries are all nonzero).
\(\vec{b} - A \vec{x}^{(k)} = \vec{r}\) is call the residual.

Notes#

The cost of one iteration is \(O(n^2)\) for a full matrix, and this is dominated by the matrix-vector product \(A \vec{x}^{(k)}\).
This cost can be reduced to \(O(n)\) if the matrix \(A\) is sparse - this is when iterative methods are especially attractive (see next lecture).
The amount of work also depends on the number of iterations required to get a “satisfactory” solution.
- The number of iterations depends on the matrix;
- Fewer iterations are needed for a less accurate solution;
- A good initial estimate \(\vec{x}^{(0)}\) reduces the required number of iterations.
Unfortunately, the iteration might not converge!

Component by component#

The Jacobi iteration updates all elements of \(\vec{x}^{(k)}\) simultaneously to get \(\vec{x}^{(k+1)}\).

Writing the method out component by component gives

\[\begin{split} \begin{aligned} x_1^{(k+1)} &= x_1^{(k)} + \frac{1}{A_{11}} \left( b_1 - \sum_{j=1}^n A_{1j} x_j^{(k)} \right) \\ x_2^{(k+1)} &= x_2^{(k)} + \frac{1}{A_{22}} \left( b_2 - \sum_{j=1}^n A_{2j} x_j^{(k)} \right) \\ \vdots \quad & \hphantom{=} \quad \vdots \\ x_n^{(k+1)} &= x_n^{(k)} + \frac{1}{A_{nn}} \left( b_n - \sum_{j=1}^n A_{nj} x_j^{(k)} \right). \end{aligned} \end{split}\]

Note that once the first step has been taken, \(x_1^{(k+1)}\) is already known, but the Jacobi iteration does not make use of this information!

Gauss-Seidel iteration#

Alternatively, the iteration might use \(x_i^{(k+1)}\) as soon as it is calculated, giving

\[\begin{split} \begin{aligned} x_1^{(k+1)} & = x_1^{(k)} + \frac{1}{A_{11}} \left( b_1 - \sum_{j=1}^n A_{1j} x_j^{(k)} \right) \\ x_2^{(k+1)} & = x_2^{(k)} + \frac{1}{A_{22}} \left( b_2 - A_{21} x_1^{(k+1)} - \sum_{j=2}^n A_{2j} x_j^{(k)} \right) \\ x_3^{(k+1)} & = x_3^{(k)} + \frac{1}{A_{33}} \left( b_3 - \sum_{j=1}^2 A_{3j} x_j^{(k+1)} - \sum_{j=3}^n A_{3j} x_j^{(k)} \right) \\ \vdots \quad & \hphantom{=} \quad \vdots \\ x_i^{(k+1)} & = x_i^{(k)} + \frac{1}{A_{ii}} \left( b_i - \sum_{j=1}^{i-1} A_{ij} x_j^{(k+1)} - \sum_{j=i}^n A_{ij} x_j^{(k)} \right) \\ \vdots \quad & \hphantom{=} \quad \vdots \\ x_n^{(k+1)} & = x_n^{(k)} + \frac{1}{A_{nn}} \left( b_n - \sum_{j=1}^{n-1} A_{nj} x_j^{(k+1)} - A_{nn} x_n^{(k)} \right). \end{aligned} \end{split}\]

Consider the system \(A \vec{x}= b\) but rewrite the matrix \(A\) as

\[ A = D + L + U, \]

where \(D\) is the diagonal of \(A\), \(L\) contains the elements below the diagonal and \(U\) contains the elements above the diagonal.

The system can be manipulated to give

\[\begin{split} \begin{aligned} A \vec{x} = (D + L + U) \vec{x} & = \vec{b} \\ (D + L) \vec{x} & = \vec{b} - U \vec{x} \\ (D + L) \vec{x} & = (D + L) \vec{x} + (\vec{b} - A \vec{x}) \\ \vec{x} & = \vec{x} + (D + L)^{-1} (\vec{b} - A \vec{x}), \end{aligned} \end{split}\]

…and hence the Gauss-Seidel iteration

\[ \vec{x}^{(k+1)} = \vec{x}^{(k)} + (D + L)^{-1} (\vec{b} - A \vec{x}^{(k)}). \]

That is \(P = (D+L)^{-1}\).

Example 1#

Take two iterations of (i) Jacobi iteration; (ii) Gauss-Seidel iteration to approximate the solution of the following system using the initial guess \(\vec{x}^{(0)} = (1, 1)^T\):

\[\begin{split} \begin{pmatrix} 2 & 1 \\ -1 & 4 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = \begin{pmatrix} 3.5 \\ 0.5 \end{pmatrix} \end{split}\]

What happens if the initial estimate is altered to \(\vec{x}^{(0)} = (2, 1)^T\) (homework).
Note that the exact solution to this system is \(x_1 = 1.5\) and \(x_2 = 0.5\).

(i) Jacobi iteration#

Starting from \(\vec{x}^{(0)} = (2, 1)^T\) we have

Iteration 1:

\[\begin{split} \begin{aligned} x^{(1)}_1 & = x^{(0)}_1 + \frac{1}{A_{11}} (b_1 - A_{11} x^{(0)}_1 - A_{12} x^{(0)}_2) \\ & = 2 + \frac{1}{2} (3.5 - 2 \times 2 - 1 \times 1) = 1.25 \\ x^{(1)}_2 & = x^{(0)}_2 + \frac{1}{A_{22}} (b_2 - A_{21} x^{(0)}_1 - A_{22} x^{(0)}_2) \\ & = 1 + \frac{1}{4} (0.5 - (-1) \times 2 - 4 \times 1) = 0.625. \end{aligned} \end{split}\]

Iteration 2:

\[\begin{split} \begin{aligned} x^{(2)}_1 & = x^{(1)}_1 + \frac{1}{A_{11}} (b_1 - A_{11} x^{(1)}_1 - A_{12} x^{(1)}_2) \\ & = 1.25 + \frac{1}{2} (3.5 - 2 \times 1.25 - 1 \times 0.625) = 1.4375 \\ x^{(2)}_2 & = x^{(1)}_2 + \frac{1}{A_{22}} (b_2 - A_{21} x^{(1)}_1 - A_{22} x^{(1)}_2) \\ & = 0.625 + \frac{1}{4} (0.5 - (-1) \times 1.25 - 4 \times 0.625) = 0.4375. \end{aligned} \end{split}\]

Note the only difference between the formulae for Iteration 1 and 2 is the iteration number, the superscript in brackets.

(ii) Gauss-Seidel iteration#

Starting from \(\vec{x}^{(0)} = (2, 1)^T\) we have

Iteration 1:

\[\begin{split} \begin{aligned} x^{(1)}_1 & = x^{(0)}_1 + \frac{1}{A_{11}} (b_1 - A_{11} x^{(0)}_1 - A_{12} x^{(0)}_2) \\ & = 2 + \frac{1}{2} (3.5 - 2 \times 2 - 1 \times 1) = 1.25 \\ x^{(1)}_2 & = x^{(0)}_2 + \frac{1}{A_{22}} (b_2 - A_{21} x^{(1)}_1 - A_{22} x^{(0)}_2) \\ & = 1 + \frac{1}{4} (0.5 - (-1) \times 1.25 - 4 \times 1) = 0.4375. \end{aligned} \end{split}\]

Iteration 2:

\[\begin{split} \begin{aligned} x^{(2)}_1 & = x^{(1)}_1 + \frac{1}{A_{11}} (b_1 - A_{11} x^{(1)}_1 - A_{12} x^{(1)}_2) \\ & = 1.25 + \frac{1}{2} (3.5 - 2 \times 1.25 - 1 \times 0.4375) = 1.53125 \\ x^{(2)}_2 & = x^{(1)}_2 + \frac{1}{A_{22}} (b_2 - A_{21} x^{(2)}_1 - A_{22} x^{(1)}_2) \\ & = 0.4375 + \frac{1}{4} (0.5 - (-1) \times 1.53125 - 4 \times 0.4375) = 0.5078125. \end{aligned} \end{split}\]

Again, note the changes in the iteration number on the right hand side of these equations, especially the differences against the Jacobi method.

Example 2 (homework)#

Take one iteration of (i) Jacobi iteration; (ii) Gauss-Seidel iteration to approximate the solution of the following system using the initial guess \(\vec{x}^{(0)} = (1, 2, 3)^T\):

\[\begin{split} \begin{pmatrix} 2 & 1 & 0 \\ 1 & 3 & 1 \\ 0 & 1 & 2 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} = \begin{pmatrix} 6 \\ 10 \\ 6 \end{pmatrix}. \end{split}\]

Note that the exact solution to this system is \(x_1 = 2, x_2 = 2, x_3 = 2\).

Notes#

Here both methods converge, but fairly slowly. They might not converge at all!
We will discuss convergence and stopping criteria in the next lecture.
The Gauss-Seidel iteration generally out-performs the Jacobi iteration.
Performance can depend on the order in which the equations are written.
Both iterative algorithms can be made faster and more efficient for sparse systems of equations (far more than direct methods).

Summary#

Many complex computational problems simply cannot be solved with today’s computers using direct methods.
Iterative methods are used instead since they can massively reduce the computational cost and storage required to get a “good enough” solution.
These basic iterative methods are simple to describe and program but generally slow to converge to an accurate answer - typically \(O(n)\) iterations are required.
Their usefulness for general matrix systems is very limited therefore - next lecture we will show their value in the solution of sparse systems however.
More advanced iterative methods do exist but are beyond the scope of this module - see Final year projects, MSc projects, PhD, and beyond!

Lecture 09: Iterative solutions of linear equations

Contents

Lecture 09: Iterative solutions of linear equations#

Direct methods#

Iterative methods#

Some very bad examples#

Jacobi iteration#

Notes#

Component by component#

Gauss-Seidel iteration#

Example 1#

(i) Jacobi iteration#

(ii) Gauss-Seidel iteration#

Example 2 (homework)#

Notes#

Summary#

Further reading#