Are Sin/Cos lookup tables still relevant today with regards to performance?

This is 10% slower than my ‘slow’ taylor version.


	public static final float sinTaylor2(float rad) {
		double x = rad;
		double a0 = +1.0;
		double a1 = -1.666666666640169148537065260055e-1;
		double a2 = +8.333333316490113523036717102793e-3;
		double a3 = -1.984126600659171392655484413285e-4;
		double a4 = +2.755690114917374804474016589137e-6;
		double a5 = -2.502845227292692953118686710787e-8;
		double a6 = +1.538730635926417598443354215485e-10;
		double x2 = x * x;
		return (float) (x * (a0 + x2 * (a1 + x2 * (a2 + x2 * (a3 + x2 * (a4 + x2 * (a5 + x2 * a6)))))));
	}

	public static final float cosTaylor2(float rad) {
		return sinTaylor2(rad + SIN_TO_COS);
	}

No :persecutioncomplex: but this is my latest attempt at reducing the dependency chain (still 3% slower than original)


	public static final float sinTaylor(float rad) {
		double x = rad;

		double x2 = x * x;
		double x6 = x2 * x2 * x2;
		double x3 = x * x * x;
		double x5 = x * x * x3;
		double x7 = x6 * x;
		double x9 = x6 * x3;
		double x11 = x6 * x5; // note the x6
		double x13 = x6 * x7;
		double x15 = x6 * x9;
		double x17 = x6 * x11;

		double val1 = x //
		   - x3 * 0.16666666666666666666666666666667//
		   + x5 * 0.00833333333333333333333333333333//
		   - x7 * 1.984126984126984126984126984127e-4//
		   + x9 * 2.7557319223985890652557319223986e-6;

		double val2 = //
			-x11 * 2.5052108385441718775052108385442e-8//
		   + x13 * 1.6059043836821614599392377170155e-10//
		   - x15 * 7.6471637318198164759011319857881e-13//
		   + x17 * 2.8114572543455207631989455830103e-15;

		return (float) (val1 + val2);
	}

FWIW, libgdx has sin/cos LUTs based originally based on Riven’s:

As you may have noticed, I adjusted my version with the edge case for straight corners.

Cool. :slight_smile: Where do you keep your latest version?

I use + half PI for cosine to double the accuracy with the same memory at the cost of + half PI. Since everything is jammed inside the MathUtils class, I used static classes so the tables aren’t initialized unless actually used.

If you looked at my code, you’d have seen I did the same :slight_smile:

I never needed more accuracy than 12 bits, so I just halved memory, when I ditched the cos lookup table.

[quote=“Nate,post:25,topic:42462”]
That’s actually a nice feature, but only really useful for the atan2-LUT. Every single game will need fast sin/cos, so I don’t really see the point in lazy loading the sin-LUT.

I didn’t change that much… I think we went through every ‘diff’ by now :slight_smile:

I tend to use the LUT in games and libgdx itself, but other libgdx contributors don’t. In fact they seem to actively change libgdx to not use the LUT. Eg, scene2d no longer uses it. I haven’t investigated their reasoning, but I assume they don’t feel it is accurate enough. Maybe when compounded?

Protip: Don’t come home at 4am, decide not to go bed and instead write some code…it might look like this: http://pastebin.java-gaming.org/a71aa0c1f6a. That code is totally broken…I didn’t even check that the min-max method hadn’t converged. But, now a few hours later I realize that it doesn’t really matter. The point of throwing that together was to be able to do a side-by-side comparison of a polynomial of a given degree with table based in Riven’s microbench marks as data points…so only the form and number of term really matters. All of these are formulated in Horner, so no attempt to break up dependency chains (thankfully I can’t image what that might have ended up looking like).

What I see using 64 bit build 1.7.0_21-b11, with: -X:MaxInlineSize=100 -XX:CompileThreshold=1000. I set the first because it might be why Riven is seeing a slowdown when breaking dependency chains and you probably do want to use a higher value than the default in general (39 is what I think it is and the number is the count of java bytecodes). The second option is to be mostly avoided as it will cut short collecting of profiling information and cause routines that probably don’t been to be compiled to be, well, compiled.

bad_sin_0 is just an equivalent to Riven’s original power series (minus doubles and nested) and is slightly faster than “half”.
bad_sin_3 is about the mid-point of full and half. This form on +/-{pi, pi/2, pi/4} have relative error values of ~{.0000167044, 5.3134810-9 , 4.5408410-12}
bad_sin_5 is about the same speed as “full”. This form on +/-{pi, pi/2, pi/4} have relative error values of ~{.0194862, .000108162, 1.5071*10-6}

Well, assuming I didn’t screw up again.