Calculate Derivatives for Newton's Method

Just for the heck of it I decided to try to code Newton’s method today. Basically it’s a way to estimate the roots of a continuous function by using tangents.
Check out the wikipedia article on it if you’ve not heard of it, it’s very cool.

However what I don’t understand about converting it to code is how to determine the derivative for xn+1 = xn - f(xn)/f’(xn)

This (http://z0rch.com/2014/02/09/quick-propagation) article says you can use a secant method to determine the derivative, however that honestly makes no sense to me… From what I understand the secant method is another means to finding the zeroes of a function. Could any math wizards out there provide a little explanation? Thanks.

Edit: Also a quick question, when I worked it out on paper using the aforementioned articles solution I didn’t get the correct answer when I started at 3 with the function x7-1000. Using newton’s method with my own calculated derivative (7x6), I got 2.690 as x3. However the article requires a xn-1; When I’m at x1, what do I use as xn-1 (which would be 0)? Could be a stupid question, sorry if it is.

So the first thing to understand about the Newton-Raphson method is that is is an iterative process. You take the answer from the last iteration (xn), stick it into the next iteration (xn-1). Now as you keenly noticed, this requires an initial starting point. The way to get this is to guess an answer, for a human quite simple, for a computer I don’t know of any methods of guessing an answer. If you choose well and your guess is near to a root then the solution will converge quickly (in fewer iterations) if you guess poorly then it will take longer. Also, if the function has multiple roots, then the starting point will affect which root is converged upon.

Now calculating derivatives. A derivative is actually defined as:

Now computationally, since we are just using approximations anyway, why not approximate the derivative by using a value of h close to 0. Like 1E-10 for example. That’s very simple for a computer to do. (And although I’ve never heard of it before, I thought secant was a trig function, it seems like this is what they mean by the secant method although they apply it to second derivatives)

You typically only use a newton raphson method when you have a analytical result for the derivative. You have to be quite careful with a finite difference as this adds significant stability issues.

Basically if you don’t know what your doing, and by the sounds of it you don’t. Start with a bisection method. It does the same thing, but is much simpler and always converges with a correct starting bracket.

Thanks for the replies guys. I’m not really looking for performance here, so the amount of iterations won’t really matter to me. Couldn’t I just use a static value to always start x1 at? If performance isn’t an issue then I don’t see how much of a difference it could make.

Also [quote]You take the answer from the last iteration (xn), stick it into the next iteration (xn-1)
[/quote]
I think you meant to switch those around.(?)

The main purpose of me doing this is for later calc classes actually. I’m not currently in any calculus courses but I like math and learn better by teaching myself of course with the help of you guys. Anyways, so that derivative function could be represented in code as


double f(float a) {
  return Math.pow(a, 3) - 10;
}

double deriv(float X) {
  double a = X;
  double h = 1e-10;
  return (f(a+h)-f(a))/h;
}

So I could, with a const value, 3, for x1, create the newton method like so:


double X = 3;

for(int i = 0; i < iterations; i++) {
   X -= (f(X)/(deriv(X)));
}

Edit: I made a really stupid mistake, give me second, fixing it
I didn’t use X for a when calculating the derivative. God I’m so stupid sometimes…

I just tested this code and it works perfectly!

Regarding delt0r’s comment, I do know what I’m doing… kind of. That’s why I’m here, to understand it better. Thanks for the bisection method suggestion, I’ll check that out too!

One problem with Newton’s method is that it is not guaranteed to converge, esp. given a poor starting point. Wikipedia mentions using bisection to robustly get a rough estimate of the root, and then continue with a faster method such as Newton’s. (Although Newton’s can be slow to converge as well)

Ah hell it works… However with the caveats you mentioned, why do neural network training methods like quick propagation use newton’s method instead of bisection and then newton’s like you said? It seems like that would be faster, which is a necessary thing with ANNs.

Maybe they already know a good starting estimate?

I suppose so… I haven’t worked extensively in that area but I have a friend who has so I’ll ask him. Anyways, thanks for all the help guys.

Np. As a quick aside, I framed it as a general optimization problem to play with my PSO lib:


Function<Double, Double> f = x -> Math.pow(x, 3) - 30; // find a root
		
Function<Vector, Double> costFunction = v -> -10000 / Math.abs(f.apply(v.get(0))); // minimize
		
double size = 1000000000000000000000000000000d; // a bit excessive
Hyperrectangle searchSpace = Hyperrectangle.UNIT(1).scl(size).sub(size / 2);
System.out.println("Search space: " + searchSpace);
		
PSO.Candidate solution = PSO.optimize(costFunction, searchSpace).with(100, PSO.PARTICLES).with(2, PSO.THREADS).withParams(.05, .2, .8).until(10, TimeUnit.MILLISECONDS);
System.out.println("Solution: " + solution);

[quote]Search space: ([-5.0E29, 5.0E29])
Solution: f(3.107232505898543) = -6241379405622.752000
[/quote]
Actual value of f at estimated root: -1.6022099202928075E-9
Not too bad, but a dedicated method is much better.

Math.pow(x,INTEGER)? Stop.

It was pseudo code.