Are Sin/Cos lookup tables still relevant today with regards to performance?

Riven · June 11, 2013, 2:48pm

This is 10% slower than my ‘slow’ taylor version.


	public static final float sinTaylor2(float rad) {
		double x = rad;
		double a0 = +1.0;
		double a1 = -1.666666666640169148537065260055e-1;
		double a2 = +8.333333316490113523036717102793e-3;
		double a3 = -1.984126600659171392655484413285e-4;
		double a4 = +2.755690114917374804474016589137e-6;
		double a5 = -2.502845227292692953118686710787e-8;
		double a6 = +1.538730635926417598443354215485e-10;
		double x2 = x * x;
		return (float) (x * (a0 + x2 * (a1 + x2 * (a2 + x2 * (a3 + x2 * (a4 + x2 * (a5 + x2 * a6)))))));
	}

	public static final float cosTaylor2(float rad) {
		return sinTaylor2(rad + SIN_TO_COS);
	}

Riven · June 11, 2013, 3:00pm

No :persecutioncomplex: but this is my latest attempt at reducing the dependency chain (still 3% slower than original)


	public static final float sinTaylor(float rad) {
		double x = rad;

		double x2 = x * x;
		double x6 = x2 * x2 * x2;
		double x3 = x * x * x;
		double x5 = x * x * x3;
		double x7 = x6 * x;
		double x9 = x6 * x3;
		double x11 = x6 * x5; // note the x6
		double x13 = x6 * x7;
		double x15 = x6 * x9;
		double x17 = x6 * x11;

		double val1 = x //
		   - x3 * 0.16666666666666666666666666666667//
		   + x5 * 0.00833333333333333333333333333333//
		   - x7 * 1.984126984126984126984126984127e-4//
		   + x9 * 2.7557319223985890652557319223986e-6;

		double val2 = //
			-x11 * 2.5052108385441718775052108385442e-8//
		   + x13 * 1.6059043836821614599392377170155e-10//
		   - x15 * 7.6471637318198164759011319857881e-13//
		   + x17 * 2.8114572543455207631989455830103e-15;

		return (float) (val1 + val2);
	}

Nate · June 11, 2013, 6:27pm

FWIW, libgdx has sin/cos LUTs based originally based on Riven’s:

github.com

libgdx/libgdx/blob/master/gdx/src/com/badlogic/gdx/math/MathUtils.java

/*******************************************************************************
 * Copyright 2011 See AUTHORS file.
 * 
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 * 
 *   http://www.apache.org/licenses/LICENSE-2.0
 * 
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 ******************************************************************************/

package com.badlogic.gdx.math;

import java.util.Random;

This file has been truncated. show original

Riven · June 11, 2013, 6:33pm

As you may have noticed, I adjusted my version with the edge case for straight corners.

Nate · June 12, 2013, 12:09am

Cool. Where do you keep your latest version?

I use + half PI for cosine to double the accuracy with the same memory at the cost of + half PI. Since everything is jammed inside the MathUtils class, I used static classes so the tables aren’t initialized unless actually used.

Riven · June 12, 2013, 12:31am

If you looked at my code, you’d have seen I did the same

I never needed more accuracy than 12 bits, so I just halved memory, when I ditched the cos lookup table.

[quote=“Nate,post:25,topic:42462”]
That’s actually a nice feature, but only really useful for the atan2-LUT. Every single game will need fast sin/cos, so I don’t really see the point in lazy loading the sin-LUT.

I didn’t change that much… I think we went through every ‘diff’ by now

Nate · June 12, 2013, 3:39am

I tend to use the LUT in games and libgdx itself, but other libgdx contributors don’t. In fact they seem to actively change libgdx to not use the LUT. Eg, scene2d no longer uses it. I haven’t investigated their reasoning, but I assume they don’t feel it is accurate enough. Maybe when compounded?

Roquen · June 12, 2013, 1:18pm

Protip: Don’t come home at 4am, decide not to go bed and instead write some code…it might look like this: http://pastebin.java-gaming.org/a71aa0c1f6a. That code is totally broken…I didn’t even check that the min-max method hadn’t converged. But, now a few hours later I realize that it doesn’t really matter. The point of throwing that together was to be able to do a side-by-side comparison of a polynomial of a given degree with table based in Riven’s microbench marks as data points…so only the form and number of term really matters. All of these are formulated in Horner, so no attempt to break up dependency chains (thankfully I can’t image what that might have ended up looking like).

What I see using 64 bit build 1.7.0_21-b11, with: -X:MaxInlineSize=100 -XX:CompileThreshold=1000. I set the first because it might be why Riven is seeing a slowdown when breaking dependency chains and you probably do want to use a higher value than the default in general (39 is what I think it is and the number is the count of java bytecodes). The second option is to be mostly avoided as it will cut short collecting of profiling information and cause routines that probably don’t been to be compiled to be, well, compiled.

bad_sin_0 is just an equivalent to Riven’s original power series (minus doubles and nested) and is slightly faster than “half”.
bad_sin_3 is about the mid-point of full and half. This form on +/-{pi, pi/2, pi/4} have relative error values of ~{.0000167044, 5.3134810^-9 , 4.5408410^-12}
bad_sin_5 is about the same speed as “full”. This form on +/-{pi, pi/2, pi/4} have relative error values of ~{.0194862, .000108162, 1.5071*10^-6}

Well, assuming I didn’t screw up again.