Lecture 18: Robust nonlinear solvers

Lecture 18: Robust nonlinear solvers#

Recap#

In the previous lecture we consider a modified version of Newton’s method in which \(f'(x^{(i)})\) is approximated:

\[ f'(x^{(i)}) \approx \frac{f(x^{(i)} + \mathrm{d}x) - f(x^{(i)})}{\mathrm{d}x}. \]
For a suitable choice of \(\mathrm{d}x\) this was shown to work very well, producing results of the same accuracy as Newton’s method in almost the same number of iterations.
We then introduced another popular modification of Newton’s method that also approximates \(f'(x^{(i)})\) as above for a particular choice of \(\mathrm{d}x = x^{(i-1)} - x^{(i)}\).
The resulting iteration is known as the secant method.

Reliability#

The Newton, modified Newton and secant methods may not always converge for a particular choice of \(x^{(0)}\).
The secant method in particular will fail if \(x^{(0)} = x^{(1)}\) or \(f(x^{(0)}) = f(x^{(1)})\).
Each of these methods can break down when
- \(f'(x^{(i)}) = 0\) for \(x^{(i)} \neq x^*\);
- \(f'(x^{(i)})\) is small but nonzero, in which case \(x^{(i+1)}\) may be further away from \(x^*\) than \(x^{(i)}\).
These methods are guaranteed to converge when \(x^{(0)}\) is “sufficiently close” to \(x^*\).
In practice a good initial estimate \(x^{(0)}\) may not be known in advance.

A combined approach#

In this algorithm we seek to combine the reliability of the bisection algorithm with the speed of the secant algorithm:

Take a function \(f(x)\) and initial estimate \(x^{(0)}\).
Search for an initial point \(x^{(1)}\) such that \(f(x^{(0)}) f(x^{(1)}) < 0\), i.e. an initial bracket \([x^{(0)}, x^{(1)}]\) for \(x^*\).
Take a single step with the secant method

\[ x^{(2)} = x^{(1)} - f(x^{(1)}) \frac{x^{(1)} - x^{(0)}}{f(x^{(1)}) - f(x^{(0)})} \]

to produce a new estimate.
If \(x^{(2)}\) is outside \([x^{(0)}, x^{(1)}]\) then reject it and apply a single bisection step, i.e. find \(x^{(2)} = (x^{(0)} + x^{(1)}) / 2\).
Update the bracket to

\[\begin{split} \begin{cases} [x^{(0)}, x^{(2)}] & \text{ if } f(x^{(0)}) f(x^{(2)}) \le 0; \\ [x^{(2)}, x^{(1)}] & \text{ if } f(x^{(2)}) f(x^{(1)}) \le 0. \end{cases} \end{split}\]
If the method has not yet converged return to step 3 with the new interval.

Notes#

When the secant iteration becomes unreliable the algorithm reverts to the bisection approach.
When the approximation is close to the root the secant method will usually be used and should converge (almost) as rapidly as Newton.
The approach can easily be adapted to find all of the roots in a given interval.
Variations and other hybrid methods are implemented in scipy as scipy.optimize.root_scalar.

Stopping criteria#

The algorithm stops if any of the following holds:

\({|x^{(i)} - x^{(i-1)}|}/{|x^{(i)}|} < \texttt{tolx}\);
\(|f(x^{(i)})| < \texttt{tolfun}\);
the number of iterations exceeds a specified number maxiter.

Criticisms:

convergence criteria should ideally satisfy both \({|x^{(i)} - x^{(i-1)}|}/{|x^{(i)}|} < \texttt{tolx}\) and \(|f(x^{(i)})| < \texttt{tolfun}\);
cannot find solutions which do not cross the \(x\)-axis.

Implementation#

The method has been slightly improved to become Brent’s method which is implemented in scipy as scipy.optimize.brentq.

Note that this method requires both initial estimates \(x^{(0)}\) and \(x^{(1)}\).

Help on function brentq in module scipy.optimize._zeros_py:

brentq(f, a, b, args=(), xtol=2e-12, rtol=8.881784197001252e-16, maxiter=100, full_output=False, disp=True)
    Find a root of a function in a bracketing interval using Brent's method.

    Uses the classic Brent's method to find a root of the function `f` on
    the sign changing interval [a , b]. Generally considered the best of the
    rootfinding routines here. It is a safe version of the secant method that
    uses inverse quadratic extrapolation. Brent's method combines root
    bracketing, interval bisection, and inverse quadratic interpolation. It is
    sometimes known as the van Wijngaarden-Dekker-Brent method. Brent (1973)
    claims convergence is guaranteed for functions computable within [a,b].

    [Brent1973]_ provides the classic description of the algorithm. Another
    description can be found in a recent edition of Numerical Recipes, including
    [PressEtal1992]_. A third description is at
    http://mathworld.wolfram.com/BrentsMethod.html. It should be easy to
    understand the algorithm just by reading our code. Our code diverges a bit
    from standard presentations: we choose a different formula for the
    extrapolation step.

    Parameters
    ----------
    f : function
        Python function returning a number. The function :math:`f`
        must be continuous, and :math:`f(a)` and :math:`f(b)` must
        have opposite signs.
    a : scalar
        One end of the bracketing interval :math:`[a, b]`.
    b : scalar
        The other end of the bracketing interval :math:`[a, b]`.
    xtol : number, optional
        The computed root ``x0`` will satisfy ``np.allclose(x, x0,
        atol=xtol, rtol=rtol)``, where ``x`` is the exact root. The
        parameter must be positive. For nice functions, Brent's
        method will often satisfy the above condition with ``xtol/2``
        and ``rtol/2``. [Brent1973]_
    rtol : number, optional
        The computed root ``x0`` will satisfy ``np.allclose(x, x0,
        atol=xtol, rtol=rtol)``, where ``x`` is the exact root. The
        parameter cannot be smaller than its default value of
        ``4*np.finfo(float).eps``. For nice functions, Brent's
        method will often satisfy the above condition with ``xtol/2``
        and ``rtol/2``. [Brent1973]_
    maxiter : int, optional
        If convergence is not achieved in `maxiter` iterations, an error is
        raised. Must be >= 0.
    args : tuple, optional
        Containing extra arguments for the function `f`.
        `f` is called by ``apply(f, (x)+args)``.
    full_output : bool, optional
        If `full_output` is False, the root is returned. If `full_output` is
        True, the return value is ``(x, r)``, where `x` is the root, and `r` is
        a `RootResults` object.
    disp : bool, optional
        If True, raise RuntimeError if the algorithm didn't converge.
        Otherwise, the convergence status is recorded in any `RootResults`
        return object.

    Returns
    -------
    root : float
        Root of `f` between `a` and `b`.
    r : `RootResults` (present if ``full_output = True``)
        Object containing information about the convergence. In particular,
        ``r.converged`` is True if the routine converged.

    Notes
    -----
    `f` must be continuous.  f(a) and f(b) must have opposite signs.

    Related functions fall into several classes:

    multivariate local optimizers
      `fmin`, `fmin_powell`, `fmin_cg`, `fmin_bfgs`, `fmin_ncg`
    nonlinear least squares minimizer
      `leastsq`
    constrained multivariate optimizers
      `fmin_l_bfgs_b`, `fmin_tnc`, `fmin_cobyla`
    global optimizers
      `basinhopping`, `brute`, `differential_evolution`
    local scalar minimizers
      `fminbound`, `brent`, `golden`, `bracket`
    N-D root-finding
      `fsolve`
    1-D root-finding
      `brenth`, `ridder`, `bisect`, `newton`
    scalar fixed-point finder
      `fixed_point`

    References
    ----------
    .. [Brent1973]
       Brent, R. P.,
       *Algorithms for Minimization Without Derivatives*.
       Englewood Cliffs, NJ: Prentice-Hall, 1973. Ch. 3-4.

    .. [PressEtal1992]
       Press, W. H.; Flannery, B. P.; Teukolsky, S. A.; and Vetterling, W. T.
       *Numerical Recipes in FORTRAN: The Art of Scientific Computing*, 2nd ed.
       Cambridge, England: Cambridge University Press, pp. 352-355, 1992.
       Section 9.3:  "Van Wijngaarden-Dekker-Brent Method."

    Examples
    --------
    >>> def f(x):
    ...     return (x**2 - 1)

    >>> from scipy import optimize

    >>> root = optimize.brentq(f, -2, 0)
    >>> root
    -1.0

    >>> root = optimize.brentq(f, 0, 2)
    >>> root
    1.0

Exercise#

How can we estimate this initial bracket from an initial guess?

Examples#

We will test scipy’s implementation with our three examples from before.

Consider the example with \(f(x) = x^2 - R\) and \(R=2\). Take \(x^{(0)} = 0\) and \(x^{(1)} = 2\) and use the function call

def sqrt2(x):
    return x**2 - 2.0


brentq(sqrt2, 0.0, 2.0, xtol=1.0e-4, maxiter=100, full_output=True)

(1.4142133199955025,
       converged: True
            flag: converged
  function_calls: 8
      iterations: 7
            root: 1.4142133199955025
          method: brentq)

The algorithm gives \(x^* = 1.4142\) after 7 iterations.

Consider the example with \(f(x) = x^2 - R\) with \(R=2\), take \(x^{(0)} = 0\) and \(x^{(1)} = 2\) and use the function call

import numpy as np

eps = np.finfo(np.double).eps
brentq(sqrt2, 0.0, 2.0, xtol=4 * eps, maxiter=100, full_output=True)

(1.414213562373095,
       converged: True
            flag: converged
  function_calls: 10
      iterations: 9
            root: 1.414213562373095
          method: brentq)

The algorithm gives \(x^* = 1.414214\) after 9 iterations.
Convergence is to machine precision - so it takes more iterations than previously - but not too many!

Consider the compound interest example with \([x^{(0)}, x^{(1)}] = [200, 300]\), using the function call

def compound(n):
    # Set P, M and r.
    P = 150000
    M = 1000
    r = 5.0
    # Evaluate the function.
    i = r / 1200
    f = M - P * (i * (1 + i) ** n) / ((1 + i) ** n - 1)

    return f


brentq(compound, 200, 300, xtol=1.0e-1, full_output=True)

(235.87468057187797,
       converged: True
            flag: converged
  function_calls: 6
      iterations: 5
            root: 235.87468057187797
          method: brentq)

This converges to the root \(x^* = 235.87\) after 5 iterations (using quite a large stopping tolerance in this case).

Consider the NACA0012 aerofoil example with \([x^{(0)}, x^{(1)}] = [0.5, 1]\) using the function call

def naca0012(x):
    # Set the required thickness at the intersection to be 0.1.
    t = 0.1

    # Evaluate the function.
    yp = (
        -0.1015 * np.power(x, 4)
        + 0.2843 * np.power(x, 3)
        - 0.3516 * np.power(x, 2)
        - 0.126 * x
        + 0.2969 * np.sqrt(x)
    )
    f = yp - 0.5 * t

    return f


brentq(naca0012, 0.5, 1.0, xtol=1.0e-4, full_output=True)

(0.7652489385378323,
       converged: True
            flag: converged
  function_calls: 7
      iterations: 6
            root: 0.7652489385378323
          method: brentq)

This converges to the root \(x^* = 0.765249\) in 6 iterations.

Consider the NACA0012 aerofoil example with \([x^{(0)}, x^{(1)}] = [0, 0.5]\) using the function call

brentq(naca0012, 0.0, 0.5, xtol=1.0e-4, full_output=True)

(0.03390094402176156,
       converged: True
            flag: converged
  function_calls: 9
      iterations: 8
            root: 0.03390094402176156
          method: brentq)

This converges to the other root \(x^* = 0.33899\) after 44 iterations.

Summary#

Solving nonlinear equations is hard.
It is not always possible to guarantee an accurate solution.
However, it is possible to design a robust algorithm that will usually give a good answer.
Finding all possible solutions is particularly challenging since we may not know how many solutions there are in advance.