Imagine we want to compute the square root of a number nn. The basic idea of
Heron’s
method
,
named after the mathematician and engineer, Heron of
Alexandria, is to find a number gg that is close to nsqrt{n} and to then
average that with the number n/gn/g, which corrects for the fact that gg either
over- or underestimates nsqrt{n}.

I like this algorithm because it is simple and works surprisingly
well. However, I first learned about it in (Benjamin & Shermer, 2006), which
did not provide a particularly deep explanation or analysis for why this method works. The goal of this post is to better
understand Heron’s method. How does it work? Why does it work? And how good are
the estimates?

The algorithm

Let’s demonstrate the method with an example. Consider computing the square root of n=33n=33. We start by finding a number that forms a perfect square that is
close to 3333. Here, let’s pick g=6g=6, since 62=366^2 = 36. Then we compute a
second number, b=n/gb = n/g. In practice, computing bb in your head may require an approximation. Here, we can compute
it exactly as 33/6=5.533 / 6 = 5.5. Then our final guess is the average of these two numbers or

ng+b2,(1) sqrt{n} approx frac{g+b}{2}, tag{1}

which in our example is

33=5.74456264654…6+5.52=5.75.(2) sqrt{33} = 5.74456264654… approx frac{6 + 5.5}{2} = 5.75. tag{2}

That is pretty good. The relative error is less than 0.1%0.1%! And
furthermore, this is pretty straightforward to do in your head when nn isn’t
too large.

Why does this work?

Intuitively, Heron’s method works because gg is either an over- or
underestimate of nsqrt{n}. Then n/gn/g is an under- or overestimate,
respectively, of nsqrt{n}. So the average of gg and n/gn/g should be closer to
nsqrt{n} than either gg or n/gn/g is.

While this was probably Heron’s reasoning, we can offer a better explanation
using calculus: Heron’s method works because we’re performing a second-order Taylor
approximation around our initial guess. Put more specifically, we’re making a linear approximation of the nonlinear square root function at the point
g2g^2.

To see this, recall that the general form of a Taylor expansion about a
point aa is

f(x)=f(a)+k=1f(k)(a)(xa)kk!,(3) f(x) = f(a) + sum_{k=1}^{infty} frac{f^{(k)}(a)(x-a)^k}{k!}, tag{3}

where the notation f(k)f^{(k)} denotes the kk-th derivative of ff. If we define f(x)=xf(x) = sqrt{x}, then

f(x)=12x,(4) f^{prime}(x) = frac{1}{2 sqrt{x}}, tag{4}

and so the second-order Taylor approximation of xsqrt{x} is

xa+xa2a.(5) sqrt{x} approx sqrt{a} + frac{x-a}{2 sqrt{a}}. tag{5}

Now choose x=nx=n, and let h(x)h(x) denote the Taylor expansion around a=g2a = g^2. Then we have

nh(n)=g+ng22g=g+n/g2=g+b2.(6) sqrt{n} approx h(n) = g + frac{n – g^2}{2 g} = frac{g + n/g}{2} = frac{g+b}{2}. tag{6}

And this is exactly what we calculated above.

Geometric interpretation

In general, the second second-order Taylor expansion approximates a possibly
nonlinear function f(x)f(x) with a linear function at the point aa:

f(x)f(a)+f(a)(xa)y-intercept + slope × x-shift.(7) f(x) approx overbrace{f(a) + f^{prime}(a)(x-a)}^{text{$y$-intercept $+$ slope $times$ $x$-shift}}. tag{7}

Thus, the Taylor approximation represents the tangent line to f(x)f(x) at the point (a,f(a))(a, f(a)). We can see this for f(x)=xf(x) = sqrt{x} in Figure 11. This is a useful visualization because it highlights something
interesting: we expect Heron’s approximation to be worse for
smaller numbers. That’s because the square root function is “more curved”
(speaking loosely) for numbers closer to zero (Figure 22, left). As the square root function
flattens out for larger numbers, the linear approximation improves (Figure 22,
right).

Figure 1. A visualization of Heron’s
method, or a second-order Taylor approximation of f(x)=xf(x) = sqrt{x}. We
construct a linear approximation h(x)h(x) (red dashed line) to the nonlinear function
f(x)f(x) (blue line). We then guess h(n)h(n) (black dot). Our
error is the absolute vertical difference between h(n)h(n) (black
dot) and nsqrt{n} (blue dot).

How good is the approximation?

How good is this method? Did we just
get lucky with n=33n=33 or does Heron’s method typically produce sensible
estimates of nsqrt{n}?

To answer this question, I’m replicating a nice figure from an article from MathemAfrica, in
which the author makes a plot with the input number nn on the xx-axis and the absolute error

nh(n)(8) left| sqrt{n} – h(n) right| tag{8}

on the yy-axis (Figure 22, blue line). (Note that when programming Heron’s
method, we must decide if we want to find g2g^2 by searching numbers greater or
less than nn; here, I’ve set gg as the first integer less than
the square root of nn, or as g = floor(sqrt(n)).) I like this figure because it captures
two interesting properties of Heron’s method. First, as we discussed above, the
Taylor approximation will typically be worse when nn is small (when f(x)=xf(x) = sqrt{x};
this is not true in general). And second, the error drops to zero on perfect
squares and increases roughly linearly between perfect squares.

Figure 2. The absolute (blue) and
relative (red) errors between the true value nsqrt{n} and the estimate
using Heron’s approximation h(n)h(n). The errors are zero when nn is a perfect
square and are smaller on average for small nn than for large nn.

The MathemAfrica post focuses on lowering this absolute error
by judiciuosly picking the initial guess gg. This is interesting as analysis. However, in my mind, this is
unnecessarily complicated for most practical mental math scenarios, i.e. for quick sanity checking rather than in a demonstration or competition. Why it is overly complicated? Well, the relative error,

nh(n)n,(9) frac{left|sqrt{n} – h(n) right|}{sqrt{n}}, tag{9}

rapidly decays to less than a percentage point or so (Figure 22, red line).

If
you’re not using a calculator to compute a square root, you’re probably just
getting a rough idea of a problem. And if we actually wanted to lower the absolute error and didn’t care about a human’s
mental limits, we should just expand the Taylor approximation to higher orders.

Read More