Lecture 18: Robust nonlinear solvers#

Recap#

  • In the previous lecture we consider a modified version of Newton’s method in which \(f'(x^{(i)})\) is approximated:

    \[ f'(x^{(i)}) \approx \frac{f(x^{(i)} + \mathrm{d}x) - f(x^{(i)})}{\mathrm{d}x}. \]
  • For a suitable choice of \(\mathrm{d}x\) this was shown to work very well, producing results of the same accuracy as Newton’s method in almost the same number of iterations.

  • We then introduced another popular modification of Newton’s method that also approximates \(f'(x^{(i)})\) as above for a particular choice of \(\mathrm{d}x = x^{(i-1)} - x^{(i)}\).

  • The resulting iteration is known as the secant method.

Reliability#

  • The Newton, modified Newton and secant methods may not always converge for a particular choice of \(x^{(0)}\).

  • The secant method in particular will fail if \(x^{(0)} = x^{(1)}\) or \(f(x^{(0)}) = f(x^{(1)})\).

  • Each of these methods can break down when

    • \(f'(x^{(i)}) = 0\) for \(x^{(i)} \neq x^*\);

    • \(f'(x^{(i)})\) is small but nonzero, in which case \(x^{(i+1)}\) may be further away from \(x^*\) than \(x^{(i)}\).

  • These methods are guaranteed to converge when \(x^{(0)}\) is “sufficiently close” to \(x^*\).

  • In practice a good initial estimate \(x^{(0)}\) may not be known in advance.

A combined approach#

In this algorithm we seek to combine the reliability of the bisection algorithm with the speed of the secant algorithm:

  1. Take a function \(f(x)\) and initial estimate \(x^{(0)}\).

  2. Search for an initial point \(x^{(1)}\) such that \(f(x^{(0)}) f(x^{(1)}) < 0\), i.e. an initial bracket \([x^{(0)}, x^{(1)}]\) for \(x^*\).

  3. Take a single step with the secant method

    \[ x^{(2)} = x^{(1)} - f(x^{(1)}) \frac{x^{(1)} - x^{(0)}}{f(x^{(1)}) - f(x^{(0)})} \]

    to produce a new estimate.

  4. If \(x^{(2)}\) is outside \([x^{(0)}, x^{(1)}]\) then reject it and apply a single bisection step, i.e. find \(x^{(2)} = (x^{(0)} + x^{(1)}) / 2\).

  5. Update the bracket to

    \[\begin{split} \begin{cases} [x^{(0)}, x^{(2)}] & \text{ if } f(x^{(0)}) f(x^{(2)}) \le 0; \\ [x^{(2)}, x^{(1)}] & \text{ if } f(x^{(2)}) f(x^{(1)}) \le 0. \end{cases} \end{split}\]
  6. If the method has not yet converged return to step 3 with the new interval.

Notes#

  • When the secant iteration becomes unreliable the algorithm reverts to the bisection approach.

  • When the approximation is close to the root the secant method will usually be used and should converge (almost) as rapidly as Newton.

  • The approach can easily be adapted to find all of the roots in a given interval.

  • Variations and other hybrid methods are implemented in scipy as scipy.optimize.root_scalar.

Stopping criteria#

The algorithm stops if any of the following holds:

  • \({|x^{(i)} - x^{(i-1)}|}/{|x^{(i)}|} < \texttt{tolx}\);

  • \(|f(x^{(i)})| < \texttt{tolfun}\);

  • the number of iterations exceeds a specified number maxiter.

Criticisms:

  • convergence criteria should ideally satisfy both \({|x^{(i)} - x^{(i-1)}|}/{|x^{(i)}|} < \texttt{tolx}\) and \(|f(x^{(i)})| < \texttt{tolfun}\);

  • cannot find solutions which do not cross the \(x\)-axis.

Implementation#

The method has been slightly improved to become Brent’s method which is implemented in scipy as scipy.optimize.brentq.

Note that this method requires both initial estimates \(x^{(0)}\) and \(x^{(1)}\).

Hide code cell source
from scipy.optimize import brentq

help(brentq)
Help on function brentq in module scipy.optimize._zeros_py:

brentq(f, a, b, args=(), xtol=2e-12, rtol=8.881784197001252e-16, maxiter=100, full_output=False, disp=True)
    Find a root of a function in a bracketing interval using Brent's method.

    Uses the classic Brent's method to find a root of the function `f` on
    the sign changing interval [a , b]. Generally considered the best of the
    rootfinding routines here. It is a safe version of the secant method that
    uses inverse quadratic extrapolation. Brent's method combines root
    bracketing, interval bisection, and inverse quadratic interpolation. It is
    sometimes known as the van Wijngaarden-Dekker-Brent method. Brent (1973)
    claims convergence is guaranteed for functions computable within [a,b].

    [Brent1973]_ provides the classic description of the algorithm. Another
    description can be found in a recent edition of Numerical Recipes, including
    [PressEtal1992]_. A third description is at
    http://mathworld.wolfram.com/BrentsMethod.html. It should be easy to
    understand the algorithm just by reading our code. Our code diverges a bit
    from standard presentations: we choose a different formula for the
    extrapolation step.

    Parameters
    ----------
    f : function
        Python function returning a number. The function :math:`f`
        must be continuous, and :math:`f(a)` and :math:`f(b)` must
        have opposite signs.
    a : scalar
        One end of the bracketing interval :math:`[a, b]`.
    b : scalar
        The other end of the bracketing interval :math:`[a, b]`.
    xtol : number, optional
        The computed root ``x0`` will satisfy ``np.allclose(x, x0,
        atol=xtol, rtol=rtol)``, where ``x`` is the exact root. The
        parameter must be positive. For nice functions, Brent's
        method will often satisfy the above condition with ``xtol/2``
        and ``rtol/2``. [Brent1973]_
    rtol : number, optional
        The computed root ``x0`` will satisfy ``np.allclose(x, x0,
        atol=xtol, rtol=rtol)``, where ``x`` is the exact root. The
        parameter cannot be smaller than its default value of
        ``4*np.finfo(float).eps``. For nice functions, Brent's
        method will often satisfy the above condition with ``xtol/2``
        and ``rtol/2``. [Brent1973]_
    maxiter : int, optional
        If convergence is not achieved in `maxiter` iterations, an error is
        raised. Must be >= 0.
    args : tuple, optional
        Containing extra arguments for the function `f`.
        `f` is called by ``apply(f, (x)+args)``.
    full_output : bool, optional
        If `full_output` is False, the root is returned. If `full_output` is
        True, the return value is ``(x, r)``, where `x` is the root, and `r` is
        a `RootResults` object.
    disp : bool, optional
        If True, raise RuntimeError if the algorithm didn't converge.
        Otherwise, the convergence status is recorded in any `RootResults`
        return object.

    Returns
    -------
    root : float
        Root of `f` between `a` and `b`.
    r : `RootResults` (present if ``full_output = True``)
        Object containing information about the convergence. In particular,
        ``r.converged`` is True if the routine converged.

    Notes
    -----
    `f` must be continuous.  f(a) and f(b) must have opposite signs.

    Related functions fall into several classes:

    multivariate local optimizers
      `fmin`, `fmin_powell`, `fmin_cg`, `fmin_bfgs`, `fmin_ncg`
    nonlinear least squares minimizer
      `leastsq`
    constrained multivariate optimizers
      `fmin_l_bfgs_b`, `fmin_tnc`, `fmin_cobyla`
    global optimizers
      `basinhopping`, `brute`, `differential_evolution`
    local scalar minimizers
      `fminbound`, `brent`, `golden`, `bracket`
    N-D root-finding
      `fsolve`
    1-D root-finding
      `brenth`, `ridder`, `bisect`, `newton`
    scalar fixed-point finder
      `fixed_point`

    References
    ----------
    .. [Brent1973]
       Brent, R. P.,
       *Algorithms for Minimization Without Derivatives*.
       Englewood Cliffs, NJ: Prentice-Hall, 1973. Ch. 3-4.

    .. [PressEtal1992]
       Press, W. H.; Flannery, B. P.; Teukolsky, S. A.; and Vetterling, W. T.
       *Numerical Recipes in FORTRAN: The Art of Scientific Computing*, 2nd ed.
       Cambridge, England: Cambridge University Press, pp. 352-355, 1992.
       Section 9.3:  "Van Wijngaarden-Dekker-Brent Method."

    Examples
    --------
    >>> def f(x):
    ...     return (x**2 - 1)

    >>> from scipy import optimize

    >>> root = optimize.brentq(f, -2, 0)
    >>> root
    -1.0

    >>> root = optimize.brentq(f, 0, 2)
    >>> root
    1.0

Exercise#

How can we estimate this initial bracket from an initial guess?

Examples#

We will test scipy’s implementation with our three examples from before.

Consider the example with \(f(x) = x^2 - R\) and \(R=2\). Take \(x^{(0)} = 0\) and \(x^{(1)} = 2\) and use the function call

def sqrt2(x):
    return x**2 - 2.0


brentq(sqrt2, 0.0, 2.0, xtol=1.0e-4, maxiter=100, full_output=True)
(1.4142133199955025,
       converged: True
            flag: converged
  function_calls: 8
      iterations: 7
            root: 1.4142133199955025
          method: brentq)
  • The algorithm gives \(x^* = 1.4142\) after 7 iterations.

Consider the example with \(f(x) = x^2 - R\) with \(R=2\), take \(x^{(0)} = 0\) and \(x^{(1)} = 2\) and use the function call

import numpy as np

eps = np.finfo(np.double).eps
brentq(sqrt2, 0.0, 2.0, xtol=4 * eps, maxiter=100, full_output=True)
(1.414213562373095,
       converged: True
            flag: converged
  function_calls: 10
      iterations: 9
            root: 1.414213562373095
          method: brentq)
  • The algorithm gives \(x^* = 1.414214\) after 9 iterations.

  • Convergence is to machine precision - so it takes more iterations than previously - but not too many!

Consider the compound interest example with \([x^{(0)}, x^{(1)}] = [200, 300]\), using the function call

def compound(n):
    # Set P, M and r.
    P = 150000
    M = 1000
    r = 5.0
    # Evaluate the function.
    i = r / 1200
    f = M - P * (i * (1 + i) ** n) / ((1 + i) ** n - 1)

    return f


brentq(compound, 200, 300, xtol=1.0e-1, full_output=True)
(235.87468057187797,
       converged: True
            flag: converged
  function_calls: 6
      iterations: 5
            root: 235.87468057187797
          method: brentq)
  • This converges to the root \(x^* = 235.87\) after 5 iterations (using quite a large stopping tolerance in this case).

Consider the NACA0012 aerofoil example with \([x^{(0)}, x^{(1)}] = [0.5, 1]\) using the function call

def naca0012(x):
    # Set the required thickness at the intersection to be 0.1.
    t = 0.1

    # Evaluate the function.
    yp = (
        -0.1015 * np.power(x, 4)
        + 0.2843 * np.power(x, 3)
        - 0.3516 * np.power(x, 2)
        - 0.126 * x
        + 0.2969 * np.sqrt(x)
    )
    f = yp - 0.5 * t

    return f


brentq(naca0012, 0.5, 1.0, xtol=1.0e-4, full_output=True)
(0.7652489385378323,
       converged: True
            flag: converged
  function_calls: 7
      iterations: 6
            root: 0.7652489385378323
          method: brentq)

This converges to the root \(x^* = 0.765249\) in 6 iterations.

Consider the NACA0012 aerofoil example with \([x^{(0)}, x^{(1)}] = [0, 0.5]\) using the function call

brentq(naca0012, 0.0, 0.5, xtol=1.0e-4, full_output=True)
(0.03390094402176156,
       converged: True
            flag: converged
  function_calls: 9
      iterations: 8
            root: 0.03390094402176156
          method: brentq)
  • This converges to the other root \(x^* = 0.33899\) after 44 iterations.

Summary#

  • Solving nonlinear equations is hard.

  • It is not always possible to guarantee an accurate solution.

  • However, it is possible to design a robust algorithm that will usually give a good answer.

  • Finding all possible solutions is particularly challenging since we may not know how many solutions there are in advance.

Further reading#

The slides used in the lecture are also available