3DzzD MathX class Accurate and Fast cos/sin

EDIT there is an updated version for MS JVM in this thread, this one will give weird result if used with Microsoft JVM!

Inspired by previous posts : “Fast Math post” and “Jeff reducing angle technic” , I decide to share a class I created for the 3DzzD Web 3D Engine.

I will run about 15 times faster for random angle input than the original Math.cos, it will also give more accurate results.


package dzzd.utils;

/** 
 *  MathX class.
 *  

 * Base class used by 3DzzD Web 3D Engine to include faster and/or smarter maths functions

 * 

 * <u>Some results comparaison between Math.cos and MathX.cos function, MathX.cos seems to be 0.5 to 20 times faster than Math.cos function and give more accurate results especially for PI/2</u>

  *

 * <B>-PI</B>

 * MathX.cos(-Math.PI) =-1.0

 * Math.cos(-Math.PI)  =-1.0

 *

 * <B>PI</B>

 * MathX.cos(Math.PI) =-1.0

 * Math.cos(Math.PI)  =-1.0

 *

 * Root(2)/2=0.7071067811865476

 * <B>PI/4</B>

 * MathX.cos(Math.PI*0.25) =0.7071067811865476

 * Math.cos(Math.PI*0.25)  =0.7071067811865476

 *

 * <B>PI/2</B>

 * MathX.cos(Math.PI*0.25) =0.0

 * Math.cos(Math.PI*0.25)  =6.123233995736766E-17

 *

 * <B>-PI/4</B>

 * MathX.cos(-Math.PI*0.25)=0.7071067811865476

 * Math.cos(-Math.PI*0.25) =0.7071067811865476

 *

 * <B>-PI/2</B>

 * MathX.cos(-Math.PI*0.5) =0.0

 * Math.cos(-Math.PI*0.5)  =6.123233995736766E-17

 *

 *  @author Bruno Augier (DzzD)
 *  @version 1.0, 01/03/07
 *  @since 1.0
 * @see IRender3D
 * @see IMesh3D
 * @see DzzD
 * @see <a href="http://dzzd.net/">http://dzzd.net/</a>
 */

public class MathX
{
	
	private static double f2=-0.5;
	private static double f4=-f2/(3.0*4.0);
	private static double f6=-f4/(5.0*6.0);
	private static double f8=-f6/(7.0*8.0);
	private static double f10=-f8/(9.0*10.0);
	private static double f12=-f10/(11.0*12.0);
	private static double f14=-f12/(13.0*14.0);
	private static double f16=-f14/(15.0*16.0);
	private static double f18=-f16/(17.0*18.0);
	private static double f20=-f18/(19.0*20.0);
	private static double PI=Math.PI;
	private static double PI2=2.0*PI;
	private static double PI05=0.5*PI;

	/**
	* Compute and return sinus of its parameter using taylor serie
	* @param x angle in radian to 
	* @return sinus value for the given parameter
	*/
	public static double sin(double x)
	{
		return cos(x-PI05);
	}

	/**
	* Compute and return cosinus of its parameter using taylor serie
	* @param x angle in radian to 
	* @return cosinus value for the given parameter
	*/	
    public static double cos(double x)
    {
    	if(x<0.0) x=-x;
    	if(x<PI2) 
    	{
    		if(x<PI)
    		{	
				double x2=x*x;
				return 1.0+x2*(f2+x2*(f4+x2*(f6+x2*(f8+x2*(f10+x2*(f12+x2*(f14+x2*(f16+x2*(f18+x2*f20)))))))));    	
			}
			else
			{
				x-=PI;
				double x2=x*x;
				return -(1.0+x2*(f2+x2*(f4+x2*(f6+x2*(f8+x2*(f10+x2*(f12+x2*(f14+x2*(f16+x2*(f18+x2*f20))))))))));    	
			}
    	}
    	x%=PI2;
    	x-=PI;
		double x2=x*x;
		return -(1.0+x2*(f2+x2*(f4+x2*(f6+x2*(f8+x2*(f10+x2*(f12+x2*(f14+x2*(f16+x2*(f18+x2*f20))))))))));
	}	
		
}

Nice mathematical approach, but how can this beat:


   public static final float cos(float rad) {
      return cos[(int) (rad * radToIndex) & SIN_MASK];
   }

with 16 bits (64k table entries) it seems to be accurate enough for any (game) purpose

sorry but not enought accurate for me :stuck_out_tongue:

Lookup are good only on the last pass of a program, world and other transforms must be very very accurate, I already think that float are not enought accurate so lookup tables … is not for me :wink: , exept fot the last pass of a renderer just before drawing.

Imagine a very huge world with very little object and very huge object lookup table will result in weird frame overlapped objects and such things.

In conclusion: offcourse lookup tables are faster but lookup tables are not used for the same goal.

notice: I found recently that accessing an array randomly is a lot slower than accessing it sequentialy, anyway it is still a lot faster than compute cos using double.

note that double computation are done by the aritmetic coprocessor while the cpu continue to works.

Hmmmm. I just reinvented the wheel again. For some reason this post was not picked up on the search results, only the trig table fast math, which is not an approach I can use.
Anyway here is my FastMath class.
it implements cos sin acos asin and atan (probably should add tan for completness). Computes to arbitary precision for floats. I don’t get a massive speedup for the sin and cos, so I will nick DzzDs, but at least a x5 performance gain on the inverse functions, even to the maximum precision. Its acos that is causing an issues with JOODE.

Actually I tried your version. It speeds up for precision over 1E-5, otherwise my version is faster (coz it is as lazy as possible at expanding the Taylors). On my computer my version at the most precise runs at: 159.2312 and your cos runs at 144, and sun’s runs at 170.706. Acos though is my bottleneck anyway.

Tom

sign = sign==1?-1:1; //flip sign every time
this saves you an if which are damn expensive
sign *= -1; //flip sign every time

and if you ‘inline’ the contents of Math.abs(), you will get very likely a significant speedup too

Cheers for the tips. I added you sign tip. What do you mean by ‘inline’ ? Write my own version?

I think he means, use

if(oldTermABS<Math.abs(term)){

instead of

termABS = Math.abs(term);
if(oldTermABS<termABS){

etc ?

Would that really make any odds? Surely the compiler is smarter than that, plus I did not want to risk that value being recomputed for the while loop as well as the if?

Tom

OK. I ran both optimization. My original versions are quicker in both cases. i.e. using the ? is quicker than integer multiplication, and storing the result of abs is better than recomputing it every time I need it.
Maybe this is not the case for everyone. I am running on a core duo. It be really nice to have a self optimizing FastMath class…

My basic benchmarker is below just in case anyone else want to play with the code, you will note that if you run acos to a higher precision than in the test block, that it starts to disagree with the libraries answers, I don’t understand what the reason would be, unless I am technically adding the float in the wrong order (smallest first I think reduces numerical errors)
Tom


package net.java.dev.joode.test.util;

import junit.framework.TestCase;
import net.java.dev.joode.util.FastMath;
import net.java.dev.joode.util.StopWatch;

/**
 */
public class FastMathTest extends TestCase {

    public void testFactorial(){
        assertTrue(FastMath.factorial(3) == 1*2*3);
        assertTrue(FastMath.factorial(4) == 1*2*3*4);
        assertTrue(FastMath.factorial(0) == 1);
    }

    public void testSin(){
        for(int i=0;i<1000;i++){
            float x = (float) (Math.random()*Math.PI*2);

            System.out.println("x = " + x);

            float precise = (float) Math.sin(x);
            float approx  = FastMath.sin(x);

            System.out.println("precise = " + precise);
            System.out.println("approx = " + approx);

            float error = Math.abs(precise-approx);

            assertTrue(error < FastMath.getPrecision());
        }
    }

    public void testCos(){
        for(int i=0;i<1000;i++){
            float x = (float) (Math.random()*Math.PI*2);

            System.out.println("x = " + x);

            float precise = (float) Math.cos(x);
            float approx  = FastMath.cos(x);

            System.out.println("precise = " + precise);
            System.out.println("approx = " + approx);

            float error = Math.abs(precise-approx);

            assertTrue(error < FastMath.getPrecision());
        }
    }

    public void testATan(){
        for(int i=0;i<1000;i++){
            float x = (float) (Math.random()*2-1);

            System.out.println("x = " + x);

            float precise = (float) Math.atan(x);
            float approx  = FastMath.atan(x);

            System.out.println("precise = " + precise);
            System.out.println("approx = " + approx);

            float error = Math.abs(precise-approx);

            assertTrue(error < FastMath.getPrecision());
        }
    }

    public void testACos(){
        for(int i=0;i<1000;i++){
            float x = (float) (Math.random()*2-1);

            System.out.println("x = " + x);

            float precise = (float) Math.acos(x);
            float approx  = FastMath.acos(x);

            System.out.println("precise = " + precise);
            System.out.println("approx = " + approx);

            float error = Math.abs(precise-approx);

            assertTrue(error < FastMath.getPrecision());
        }
    }

    public void testSinSpeed(){
        int num = 10000;

        StopWatch libraryTimer = new StopWatch();
        StopWatch fastMathTimer = new StopWatch();



        float [] input = new float[num];
        float [] preciseResults = new float[num];
        float [] approxResults = new float[num];


        for(float precision=10; precision > .000001; precision*=.1f){

            FastMath.setPrecision(precision);
            
            for(int i=0;i<num;i++){
                input[i] = (float) (Math.random()*Math.PI*2);
            }

            libraryTimer.reset();
            libraryTimer.start();
            for(int i=0;i<num;i++){
                preciseResults[i] = (float)Math.sin(input[i]);
            }
            libraryTimer.stop();
            fastMathTimer.reset();
            fastMathTimer.start();
            for(int i=0;i<num;i++){
                approxResults[i] = FastMath.sin(input[i]);
            }
            fastMathTimer.stop();
            //check results
            for(int i=0;i<num;i++){
                float error = Math.abs(preciseResults[i]-approxResults[i]);
                assertTrue(error < FastMath.getPrecision());
            }

            System.out.println("precision = " + FastMath.getPrecision());
            System.out.println("average time for API.sin is " + libraryTimer.time()/(float)num);
            System.out.println("av time for FastMath.sin is " + fastMathTimer.time()/(float)num);
        }

    }

    public void testCosSpeed(){
        int num = 10000;

        StopWatch libraryTimer = new StopWatch();
        StopWatch fastMathTimer = new StopWatch();



        float [] input = new float[num];
        float [] preciseResults = new float[num];
        float [] approxResults = new float[num];


        for(float precision=10; precision > .000001; precision*=.1f){

            FastMath.setPrecision(precision);

            for(int i=0;i<num;i++){
                input[i] = (float) (Math.random()*Math.PI*2);
            }

            libraryTimer.reset();
            libraryTimer.start();
            for(int i=0;i<num;i++){
                preciseResults[i] = (float)Math.cos(input[i]);
            }
            libraryTimer.stop();
            fastMathTimer.reset();
            fastMathTimer.start();
            for(int i=0;i<num;i++){
                approxResults[i] = FastMath.cos(input[i]);
            }
            fastMathTimer.stop();
            //check results
            for(int i=0;i<num;i++){
                float error = Math.abs(preciseResults[i]-approxResults[i]);
                assertTrue(error < FastMath.getPrecision());
            }

            System.out.println("precision = " + FastMath.getPrecision());
            System.out.println("average time for API.cos is " + libraryTimer.time()/(float)num);
            System.out.println("av time for FastMath.cos is " + fastMathTimer.time()/(float)num);
        }

    }

    public void testACosSpeed(){
        int num = 1000000;

        StopWatch libraryTimer = new StopWatch();
        StopWatch fastMathTimer = new StopWatch();

        float [] input = new float[num];
        float [] preciseResults = new float[num];
        float [] approxResults = new float[num];


        for(float precision=10; precision > .00001; precision*=.1f){

            FastMath.setPrecision(precision);

            for(int i=0;i<num;i++){
                input[i] = (float) (Math.random()*2-1);
            }

            libraryTimer.reset();
            libraryTimer.start();
            for(int i=0;i<num;i++){
                preciseResults[i] = (float)Math.acos(input[i]);
            }
            libraryTimer.stop();
            fastMathTimer.reset();
            fastMathTimer.start();
            for(int i=0;i<num;i++){
                approxResults[i] = FastMath.acos(input[i]);
            }
            fastMathTimer.stop();
            //check results
            for(int i=0;i<num;i++){
                float error = Math.abs(preciseResults[i]-approxResults[i]);
                assertTrue("error="+error,error < FastMath.getPrecision());
            }

            System.out.println("precision = " + FastMath.getPrecision());
            System.out.println("average time for API.acos is " + libraryTimer.time()/(float)num);
            System.out.println("av time for FastMath.acos is " + fastMathTimer.time()/(float)num);
        }

    }
}

Tom


Actually I managed to speed it up a reasnable amount by getting rid of the sign bit with some loop unrolling


public static float atan(float x){
        //http://en.wikipedia.org/wiki/Arctangent#Infinite_series
        //atan(x) = x - x^3/3 + x^5/5
        float pow=x;
        int n=1;
        //int sign=1;
        float term=x;
        float termABS=Math.abs(term);
        float val =term;

        while(termABS > precision){
            //go up two each time
            n+=2;
            pow*=x*x;
            //sign = sign==1?-1:1;//flip sign every time

            float oldTermABS = termABS;
            //calc next term
            if(n<MAX_EXPANSIONS)
                term=/*sign*/pow*recip[n];
            else
                term=/*sign*/pow/n;

            termABS = Math.abs(term);

            if(oldTermABS<termABS){
                //precision is worsening!
                //caused by pow getting very large, and
                //factorial getting very small
                break;//stop now, things are getting worse!
            }
            val -= term;

            //go up two each time
            n+=2;
            pow*=x*x;
            //sign = sign==1?-1:1;//flip sign every time

            oldTermABS = termABS;
            //calc next term
            if(n<MAX_EXPANSIONS)
                term=/*sign*/pow*recip[n];
            else
                term=/*sign*/pow/n;

            termABS = Math.abs(term);

            if(oldTermABS<termABS){
                //precision is worsening!
                //caused by pow getting very large, and
                //factorial getting very small
                break;//stop now, things are getting worse!
            }
            val += term;
        }
        return val;
    }


precision = 1.0
average time for API.acos is 1122.7174
av time for FastMath.acos is 155.27678
precision = 0.1
average time for API.acos is 1118.1633
av time for FastMath.acos is 178.82118
precision = 0.010000001
average time for API.acos is 1124.1154
av time for FastMath.acos is 200.12914
precision = 0.0010
average time for API.acos is 1121.4628
av time for FastMath.acos is 227.57458
precision = 1.00000005E-4
average time for API.acos is 1118.9564
av time for FastMath.acos is 241.88194
precision = 1.0000001E-5
average time for API.acos is 1117.9333
av time for FastMath.acos is 263.41922

NB:

Why did you use float rather than double ?

in most case double are at least as fast as float (usually faster fo me) and give more accurate results , no ? (at least on all computers I done my bench AMD 2000XP+(1.6GHz) and pentium4 (2Ghz).

and yes… I know…sorry… I dont know about the reference of the arytmetic coprocessor for both commputers above…

Yeah…floats…Um, JOODE was built using floats in the beginning so it all is now :-. Whether that actually helps anything I am not sure. I naively thought it would be faster when I started. Maybe it saves a bit of space, but the really memory is not a problem with JOODE, speed is the main issue (not a huge issue though). A quick “find an replace” can convert JOODE over to doubles, but I would really prefer a better way (say user selectable, or even self optimizable). I need something like a preprocessor but with IDE support.

3DzzD API have a bit less than one undred of classes and one day with a lot of motivation… I done the find and replace method (I only keep float for local object vertex to keep a bit more memory)… it is a long process (could be a cupple of hours) as it introduce some bugs and bottleneck, for exemple casting a double to a int when the double is higher than Integer.MAX_VALUE or is smaller than -Integer.MAX_VALUE cost a lot of time… so when you know that it can append it is a good way to do if(double<2147483647.0 && double>-2147483647) int=(double)double else … in an other hands some bugs due to lack of precision was self corrected by using double rather than float…

making both available is certainly the best way but I really dont know how you can achieve this ::slight_smile:

EDIT: NB: float-> double==fast double=>float == slow. to resume : higher precision to lower precision is slow, and, lower precision to higher presion is fast

I can’t be bothered to profile but I’ll wager to say that…

…modern CPUs should easily branch-predict this one…

…calls to Math.abs() ought to be auto-inlined by the JVM provided your code runs often enough…

Yeah, I actually managed to get rid of all the abs calls in exchange for an if at the beggining of the method call. I will leave off posting it becuase it provides only a marginal performance gain.

This is an update of first version, needed If you plan to use it with Micrososft JVM.

Micrososft JVM will give wrong result with the first version (dont ask me why…), this version is less precise but faster…

NB: Results in comments are given for the high precision version, not this one, but anyways it is still very precise…


package net.dzzd.utils;

/** 
 *  MathX class.
 *  

 * Base class used by 3DzzD Web 3D Engine to include faster and/or smarter maths functions

 * 

 * <u>Some results comparaison between Math.cos and MathX.cos function, MathX.cos seems to be 0.5 to 20 times faster than Math.cos function and give more accurate results especially for PI/2</u>

  *

 * <B>-PI</B>

 * MathX.cos(-Math.PI) =-1.0

 * Math.cos(-Math.PI)  =-1.0

 *

 * <B>PI</B>

 * MathX.cos(Math.PI) =-1.0

 * Math.cos(Math.PI)  =-1.0

 *

 * Root(2)/2=0.7071067811865476

 * <B>PI/4</B>

 * MathX.cos(Math.PI*0.25) =0.7071067811865476

 * Math.cos(Math.PI*0.25)  =0.7071067811865476

 *

 * <B>PI/2</B>

 * MathX.cos(Math.PI*0.5) =0.0

 * Math.cos(Math.PI*0.5)  =6.123233995736766E-17

 *

 * <B>-PI/4</B>

 * MathX.cos(-Math.PI*0.25)=0.7071067811865476

 * Math.cos(-Math.PI*0.25) =0.7071067811865476

 *

 * <B>-PI/2</B>

 * MathX.cos(-Math.PI*0.5) =0.0

 * Math.cos(-Math.PI*0.5)  =6.123233995736766E-17

 *

 *  @author Bruno Augier (DzzD)
 *  @version 1.0, 01/03/07
 *  @since 1.0
 * @see IRender3D
 * @see IMesh3D
 * @see DzzD
 * @see <a href="http://dzzd.net/">http://dzzd.net/</a>
 */

public class MathX
{
	public static double PI=Math.PI;

	private static double f2=-0.5;
	private static double f4=-f2/(3.0*4.0);
	private static double f6=-f4/(5.0*6.0);
	private static double f8=-f6/(7.0*8.0);
	private static double f10=-f8/(9.0*10.0);
	private static double f12=-f10/(11.0*12.0);
	private static double f14=-f12/(13.0*14.0);
	private static double f16=-f14/(15.0*16.0);
	//MORE PRECISE BUT NOT COMPATIBLE JVM MS =>//private static double f18=-f16/(17.0*18.0);
	//MORE PRECISE BUT NOT COMPATIBLE JVM MS =>//private static double f20=-f18/(19.0*20.0);
	private static double PI2=2.0*PI;
	private static double PI05=0.5*PI;
	

	/**
	* Compute and return sinus of its parameter using taylor serie
	* @param x angle in radian to 
	* @return sinus value for the given parameter
	*/
	public static double sin(double x)
	{
		//return Math.sin(x);
		return cos(x-PI05);
	}

	/**
	* Compute and return cosinus of its parameter using taylor serie
	* @param x angle in radian to 
	* @return cosinus value for the given parameter
	*/	
    public static double cos(double x)
    {
    	
    	
    	if(x<0.0) x=-x;
    	
    	
    	if(x<PI2) 
    	{
    		if(x<PI)
    		{	
				double x2=x*x;
				return 1.0+x2*(f2+x2*(f4+x2*(f6+x2*(f8+x2*(f10+x2*(f12+x2*(f14+x2*(f16))))))));
				//MORE PRECISE BUT NOT COMPATIBLE JVM MS => return 1.0+x2*(f2+x2*(f4+x2*(f6+x2*(f8+x2*(f10+x2*(f12+x2*(f14+x2*(f16+x2*(f18+x2*f20)))))))));    	
			}
			else
			{
				x-=PI;
				double x2=x*x;
				return -(1.0+x2*(f2+x2*(f4+x2*(f6+x2*(f8+x2*(f10+x2*(f12+x2*(f14+x2*(f16)))))))));
				//MORE PRECISE BUT NOT COMPATIBLE JVM MS => return -(1.0+x2*(f2+x2*(f4+x2*(f6+x2*(f8+x2*(f10+x2*(f12+x2*(f14+x2*(f16+x2*(f18+x2*f20))))))))));    	
			}
    	}
    	
    	
    	x%=PI2;
    	x-=PI;
		double x2=x*x;
		
		return -(1.0+x2*(f2+x2*(f4+x2*(f6+x2*(f8+x2*(f10+x2*(f12+x2*(f14+x2*f16))))))));
		//MORE PRECISE BUT NOT COMPATIBLE JVM MS => return -(1.0+x2*(f2+x2*(f4+x2*(f6+x2*(f8+x2*(f10+x2*(f12+x2*(f14+x2*(f16+x2*(f18+x2*f20))))))))));
		
	}	
		
}