Lecture 18: Robust nonlinear solvers#
Recap#
In the previous lecture we consider a modified version of Newton’s method in which \(f'(x^{(i)})\) is approximated:
\[ f'(x^{(i)}) \approx \frac{f(x^{(i)} + \mathrm{d}x) - f(x^{(i)})}{\mathrm{d}x}. \]For a suitable choice of \(\mathrm{d}x\) this was shown to work very well, producing results of the same accuracy as Newton’s method in almost the same number of iterations.
We then introduced another popular modification of Newton’s method that also approximates \(f'(x^{(i)})\) as above for a particular choice of \(\mathrm{d}x = x^{(i-1)} - x^{(i)}\).
The resulting iteration is known as the secant method.
Reliability#
The Newton, modified Newton and secant methods may not always converge for a particular choice of \(x^{(0)}\).
The secant method in particular will fail if \(x^{(0)} = x^{(1)}\) or \(f(x^{(0)}) = f(x^{(1)})\).
Each of these methods can break down when
\(f'(x^{(i)}) = 0\) for \(x^{(i)} \neq x^*\);
\(f'(x^{(i)})\) is small but nonzero, in which case \(x^{(i+1)}\) may be further away from \(x^*\) than \(x^{(i)}\).
These methods are guaranteed to converge when \(x^{(0)}\) is “sufficiently close” to \(x^*\).
In practice a good initial estimate \(x^{(0)}\) may not be known in advance.
A combined approach#
In this algorithm we seek to combine the reliability of the bisection algorithm with the speed of the secant algorithm:
Take a function \(f(x)\) and initial estimate \(x^{(0)}\).
Search for an initial point \(x^{(1)}\) such that \(f(x^{(0)}) f(x^{(1)}) < 0\), i.e. an initial bracket \([x^{(0)}, x^{(1)}]\) for \(x^*\).
Take a single step with the secant method
\[ x^{(2)} = x^{(1)} - f(x^{(1)}) \frac{x^{(1)} - x^{(0)}}{f(x^{(1)}) - f(x^{(0)})} \]to produce a new estimate.
If \(x^{(2)}\) is outside \([x^{(0)}, x^{(1)}]\) then reject it and apply a single bisection step, i.e. find \(x^{(2)} = (x^{(0)} + x^{(1)}) / 2\).
Update the bracket to
\[\begin{split} \begin{cases} [x^{(0)}, x^{(2)}] & \text{ if } f(x^{(0)}) f(x^{(2)}) \le 0; \\ [x^{(2)}, x^{(1)}] & \text{ if } f(x^{(2)}) f(x^{(1)}) \le 0. \end{cases} \end{split}\]If the method has not yet converged return to step 3 with the new interval.
Notes#
When the secant iteration becomes unreliable the algorithm reverts to the bisection approach.
When the approximation is close to the root the secant method will usually be used and should converge (almost) as rapidly as Newton.
The approach can easily be adapted to find all of the roots in a given interval.
Variations and other hybrid methods are implemented in
scipy
asscipy.optimize.root_scalar
.
Stopping criteria#
The algorithm stops if any of the following holds:
\({|x^{(i)} - x^{(i-1)}|}/{|x^{(i)}|} < \texttt{tolx}\);
\(|f(x^{(i)})| < \texttt{tolfun}\);
the number of iterations exceeds a specified number
maxiter
.
Criticisms:
convergence criteria should ideally satisfy both \({|x^{(i)} - x^{(i-1)}|}/{|x^{(i)}|} < \texttt{tolx}\) and \(|f(x^{(i)})| < \texttt{tolfun}\);
cannot find solutions which do not cross the \(x\)-axis.
Implementation#
The method has been slightly improved to become Brent’s method which is implemented in scipy
as scipy.optimize.brentq
.
Note that this method requires both initial estimates \(x^{(0)}\) and \(x^{(1)}\).
Show code cell source
from scipy.optimize import brentq
help(brentq)
Help on function brentq in module scipy.optimize._zeros_py:
brentq(f, a, b, args=(), xtol=2e-12, rtol=8.881784197001252e-16, maxiter=100, full_output=False, disp=True)
Find a root of a function in a bracketing interval using Brent's method.
Uses the classic Brent's method to find a root of the function `f` on
the sign changing interval [a , b]. Generally considered the best of the
rootfinding routines here. It is a safe version of the secant method that
uses inverse quadratic extrapolation. Brent's method combines root
bracketing, interval bisection, and inverse quadratic interpolation. It is
sometimes known as the van Wijngaarden-Dekker-Brent method. Brent (1973)
claims convergence is guaranteed for functions computable within [a,b].
[Brent1973]_ provides the classic description of the algorithm. Another
description can be found in a recent edition of Numerical Recipes, including
[PressEtal1992]_. A third description is at
http://mathworld.wolfram.com/BrentsMethod.html. It should be easy to
understand the algorithm just by reading our code. Our code diverges a bit
from standard presentations: we choose a different formula for the
extrapolation step.
Parameters
----------
f : function
Python function returning a number. The function :math:`f`
must be continuous, and :math:`f(a)` and :math:`f(b)` must
have opposite signs.
a : scalar
One end of the bracketing interval :math:`[a, b]`.
b : scalar
The other end of the bracketing interval :math:`[a, b]`.
xtol : number, optional
The computed root ``x0`` will satisfy ``np.allclose(x, x0,
atol=xtol, rtol=rtol)``, where ``x`` is the exact root. The
parameter must be positive. For nice functions, Brent's
method will often satisfy the above condition with ``xtol/2``
and ``rtol/2``. [Brent1973]_
rtol : number, optional
The computed root ``x0`` will satisfy ``np.allclose(x, x0,
atol=xtol, rtol=rtol)``, where ``x`` is the exact root. The
parameter cannot be smaller than its default value of
``4*np.finfo(float).eps``. For nice functions, Brent's
method will often satisfy the above condition with ``xtol/2``
and ``rtol/2``. [Brent1973]_
maxiter : int, optional
If convergence is not achieved in `maxiter` iterations, an error is
raised. Must be >= 0.
args : tuple, optional
Containing extra arguments for the function `f`.
`f` is called by ``apply(f, (x)+args)``.
full_output : bool, optional
If `full_output` is False, the root is returned. If `full_output` is
True, the return value is ``(x, r)``, where `x` is the root, and `r` is
a `RootResults` object.
disp : bool, optional
If True, raise RuntimeError if the algorithm didn't converge.
Otherwise, the convergence status is recorded in any `RootResults`
return object.
Returns
-------
root : float
Root of `f` between `a` and `b`.
r : `RootResults` (present if ``full_output = True``)
Object containing information about the convergence. In particular,
``r.converged`` is True if the routine converged.
Notes
-----
`f` must be continuous. f(a) and f(b) must have opposite signs.
Related functions fall into several classes:
multivariate local optimizers
`fmin`, `fmin_powell`, `fmin_cg`, `fmin_bfgs`, `fmin_ncg`
nonlinear least squares minimizer
`leastsq`
constrained multivariate optimizers
`fmin_l_bfgs_b`, `fmin_tnc`, `fmin_cobyla`
global optimizers
`basinhopping`, `brute`, `differential_evolution`
local scalar minimizers
`fminbound`, `brent`, `golden`, `bracket`
N-D root-finding
`fsolve`
1-D root-finding
`brenth`, `ridder`, `bisect`, `newton`
scalar fixed-point finder
`fixed_point`
References
----------
.. [Brent1973]
Brent, R. P.,
*Algorithms for Minimization Without Derivatives*.
Englewood Cliffs, NJ: Prentice-Hall, 1973. Ch. 3-4.
.. [PressEtal1992]
Press, W. H.; Flannery, B. P.; Teukolsky, S. A.; and Vetterling, W. T.
*Numerical Recipes in FORTRAN: The Art of Scientific Computing*, 2nd ed.
Cambridge, England: Cambridge University Press, pp. 352-355, 1992.
Section 9.3: "Van Wijngaarden-Dekker-Brent Method."
Examples
--------
>>> def f(x):
... return (x**2 - 1)
>>> from scipy import optimize
>>> root = optimize.brentq(f, -2, 0)
>>> root
-1.0
>>> root = optimize.brentq(f, 0, 2)
>>> root
1.0
Exercise#
How can we estimate this initial bracket from an initial guess?
Examples#
We will test scipy
’s implementation with our three examples from before.
Consider the example with \(f(x) = x^2 - R\) and \(R=2\). Take \(x^{(0)} = 0\) and \(x^{(1)} = 2\) and use the function call
def sqrt2(x):
return x**2 - 2.0
brentq(sqrt2, 0.0, 2.0, xtol=1.0e-4, maxiter=100, full_output=True)
(1.4142133199955025,
converged: True
flag: converged
function_calls: 8
iterations: 7
root: 1.4142133199955025
method: brentq)
The algorithm gives \(x^* = 1.4142\) after 7 iterations.
Consider the example with \(f(x) = x^2 - R\) with \(R=2\), take \(x^{(0)} = 0\) and \(x^{(1)} = 2\) and use the function call
import numpy as np
eps = np.finfo(np.double).eps
brentq(sqrt2, 0.0, 2.0, xtol=4 * eps, maxiter=100, full_output=True)
(1.414213562373095,
converged: True
flag: converged
function_calls: 10
iterations: 9
root: 1.414213562373095
method: brentq)
The algorithm gives \(x^* = 1.414214\) after 9 iterations.
Convergence is to machine precision - so it takes more iterations than previously - but not too many!
Consider the compound interest example with \([x^{(0)}, x^{(1)}] = [200, 300]\), using the function call
def compound(n):
# Set P, M and r.
P = 150000
M = 1000
r = 5.0
# Evaluate the function.
i = r / 1200
f = M - P * (i * (1 + i) ** n) / ((1 + i) ** n - 1)
return f
brentq(compound, 200, 300, xtol=1.0e-1, full_output=True)
(235.87468057187797,
converged: True
flag: converged
function_calls: 6
iterations: 5
root: 235.87468057187797
method: brentq)
This converges to the root \(x^* = 235.87\) after 5 iterations (using quite a large stopping tolerance in this case).
Consider the NACA0012 aerofoil example with \([x^{(0)}, x^{(1)}] = [0.5, 1]\) using the function call
def naca0012(x):
# Set the required thickness at the intersection to be 0.1.
t = 0.1
# Evaluate the function.
yp = (
-0.1015 * np.power(x, 4)
+ 0.2843 * np.power(x, 3)
- 0.3516 * np.power(x, 2)
- 0.126 * x
+ 0.2969 * np.sqrt(x)
)
f = yp - 0.5 * t
return f
brentq(naca0012, 0.5, 1.0, xtol=1.0e-4, full_output=True)
(0.7652489385378323,
converged: True
flag: converged
function_calls: 7
iterations: 6
root: 0.7652489385378323
method: brentq)
This converges to the root \(x^* = 0.765249\) in 6 iterations.
Consider the NACA0012 aerofoil example with \([x^{(0)}, x^{(1)}] = [0, 0.5]\) using the function call
brentq(naca0012, 0.0, 0.5, xtol=1.0e-4, full_output=True)
(0.03390094402176156,
converged: True
flag: converged
function_calls: 9
iterations: 8
root: 0.03390094402176156
method: brentq)
This converges to the other root \(x^* = 0.33899\) after 44 iterations.
Summary#
Solving nonlinear equations is hard.
It is not always possible to guarantee an accurate solution.
However, it is possible to design a robust algorithm that will usually give a good answer.
Finding all possible solutions is particularly challenging since we may not know how many solutions there are in advance.
Further reading#
scipy
: Optimization and root findingscipy.optimize
The slides used in the lecture are also available