What is root finding and why do it on a computer?#

Root finding briefly explained#

Root finding means finding a solution to equations. For example you might be asked something like:

  • Find values of \(x\) that satisfy \(x^2 - 1 = 0\).

This case can be solved analytically, it will be true if \(x^2 = 1\), and there are two cases where \(x^2 = 1\), either \(x=1\) or \(x=-1\). \(x=-1\), or \(x=1\) would then be the two roots (or solutions) of this equation.

Why do we need a computer?#

Some equations are difficult (or impossible) to solve analytically. Consider some non-linear equations such as

  • Polynomial equations, that is, equations of the form $\( a_n x^n + a_{n-1}x^{n-1} + \dotsb + a_2 x^2 + a_1 x + a_0 = 0,\)\(. For example \)x^5+7x^3-18x^2-4=0$.

  • Trigonometric equations, that is, equations which include trigonometric functions. For example \(a \sin(x) +b\cos(x) = c\).

  • Mixed equations, that is, equations which contain different types of functions, e.g., \(e^{-2x} = \sin{x}\).

Some of these can be solved analytically, but it’s to see how we can keep adding and mixing terms and getting some very complicated equations.

Numerical (computational) root finding in short#

Numerical root finding works by first taking an equation, for e.g.

\[e^{-2x} = \sin{x}.\]

Generally, this is then rearranged so that it is of the form \(f(x) = 0\). So here:

\[e^{-2x} - \sin{x} = 0. \]

Then we take a guess at a solution (for e.g. what about \(x=0\)), find the value of the function at that guess (and perhaps other information, such as an estimate of it’s derivative at that point), and use that information to make an improved guess. We then iterate until we find a value of x that is suitably close to 0. This is our solution.

Notation#

We use some mathematical notation in this notebook. Let’s explain this here, so you can read over and refer back if you later encounter something you’re not sure about.

The set of all natural numbers, that is, all positive integers, that is numbers $\(1, 2, 3, 4, 5, \ldots\)\( is denoted by \)\mathbb{N}$.

The set of all real numbers is denoted by \(\mathbb{R}\). Real numbers contain all integers, all rational (these are fractions) and irrational numbers (for example, \(\sqrt{2}\), \(\pi\) or \(e\)).

If a variable \(n\) is a natural number, we will denote it by \(n \in \mathbb{N}\), the symbol \(\in\) means in.

Given two real numbers, \(a\) and \(b\) so that \(a<b\):

  • the interval \([a, b]\) is all real numbers which are greater or equal to \(a\) and less or equal to \(b\).

  • the interval \((a, b)\) is all real numbers which are greater than \(a\) and less than \(b\). This excludes \(a\) and \(b\).

Instead of writing \(x\) is an element of an interval \([a, b]\) we will shortly write \(x \in [a, b]\).

Similarly, instead of writing \(x\) is an element of an interval \((a, b)\) we will shortly write \(x \in (a, b)\).