Extremely Fast sine/cosine

And it is )
Open source != work for free.
Open source != you ought someone something.
Open source == help if you want.

Imho: It doesn’t: public open source is same like == Dota, CS
Any one can enter -
Many unmannered and unethical – no respect , only EGO

I am as easy to flatter as the next guy, but I think you overdid it slightly. You decided to name two amongst the best two ? What kind of marketing ploy is that! All I know is it worked and I will be having sweet dreams.

Roquen

One end of the spectrum is to post in extreme detail and provide a complete overview on the topic that even the stupidest of people can understand (high effort / low reward for effort) and at the other end is to post a set of facts without any explanation at all and no detail as to why the facts are relevant to the conversation (very low effort / still low reward for effort). You tend to post very much on the later side of the scale, the fact that there is little or no explanation means you get very little interaction with the community even if the community is knowledgeable in the field you are posting about.

I’m not asking you to change your style as its a personal choice and you are free to post what you want and in any style you want but I wanted to illustrate why people get frustrated with the confusion nature of your posts.

This also applies to the reader’s decisions about what to read!

Yes, organizing the presentation of material so that it can be grasped easily takes time, is at the heart of a good writer’s craft.

All this said, I have found many of Roquen’s posts to be helpful. Some could be more helpful, but that is an ongoing challenge for all of us.

Medal for being a smartarse! ::slight_smile:

Cas :slight_smile:

(eval (macroexpand roquen))

EDIT: sumOfAngles wasn’t computing the full data set :emo:

Code: (how bad is it?) http://pastebin.java-gaming.org/af3b23b7d3a1b

Results:
Benchmark Mode Samples Score Score error Units b.Test.scalarFastMath avgt 10 2514.448 255.431 us/op b.Test.scalarJavaMath avgt 10 80388.179 350.451 us/op b.Test.sumOfAngles avgt 10 3061.736 24.541 us/op

Couldn’t get jitwatch working though, so it probably could be better.

@BurntPizza, would you mind including theagentd’s interpolation method also in your benchmark, as in the following:


    public static final float PI = (float) java.lang.Math.PI;
    private static final float PI2 = PI * 2;
    private static final float PIHalf = PI * 0.5f;
    private static final int lookupBits = 9;
    private static final int lookupTableSize = 1 << lookupBits;
    private static final int lookupTableSizeMinus1 = lookupTableSize - 1;
    private static final int lookupTableSizeWithMargin = lookupTableSize + 1;
    private static final float sinTable[] = new float[lookupTableSizeWithMargin];
    private static final float pi2OverLookupSize = PI2 / lookupTableSize;
    private static final float lookupSizeOverPi2 = lookupTableSize / PI2;
    static {
        for (int i = 0; i < lookupTableSizeWithMargin; i++) {
            double d = i * pi2OverLookupSize;
            sinTable[i] = (float) java.lang.Math.sin(d);
        }
    }
    public static float sin(float rad) {
        float index = rad * lookupSizeOverPi2;
        int ii = (int) index;
        float alpha = index - ii;
        int i = ii & lookupTableSizeMinus1;
        float sin1 = sinTable[i];
        float sin2 = sinTable[i + 1];
        return sin1 * (1.0f - alpha) + sin2 * alpha;
    }
    public static float cos(float rad) {
        return sin(rad + PIHalf);
    }

[quote=“BurntPizza,post:66,topic:55153”]
What was the problem exactly? I cleaned your code a bit and verified output equivalence of the 3 implementations and got:

scalarFastMath    3110,062 us/op
scalarJavaMath  101201,482 us/op
sumOfAngles       2280,336 us/op

Notice the loop bounds on the old pastebin: [icode]for (int i = 0; i < state.N; i += 4) { // derp; should be N * 4[/icode]

Write those unit tests kids!

@KaiHH

Benchmark Mode Samples Score Score error Units b.Test.scalarAgentMath avgt 10 5209.268 103.278 us/op b.Test.scalarJavaMath avgt 10 83709.325 960.562 us/op b.Test.scalarRivenMath avgt 10 2654.518 266.021 us/op b.Test.sumOfAngles avgt 10 3174.081 396.252 us/op

@Spasi

Could I see your sumOfAngles implementation? I’m curious as to how fastMath and sumOfAngles switched places.
I’m running this on jdk1.8_05, i7 4700MQ

Latest: http://pastebin.java-gaming.org/f3b2b4d7a3b1a

Using your latest version:

Test.scalarRivenMath 2970,533 us/op
Test.sumOfAngles     2346,007 us/op

[quote=“BurntPizza,post:69,topic:55153”]

public void sumOfAngles(PrecompState state) {
	state.t += 1 / 60f;
  
	float sint = Riven.sin(state.t);
	float cost = Riven.cos(state.t);
	for (int i = 0; i < state.N; i++) {
		float cosx = state.in[i * 4 + 0];
		float sinx = state.in[i * 4 + 1];
		float cosy = state.in[i * 4 + 2];
		float siny = state.in[i * 4 + 3];
		state.out[i * 2 + 0] = cosx * sint + sinx * cost;
		state.out[i * 2 + 1] = cosy * sint + siny * cost;
	}
}

Results:

Test.scalarRivenMath 2972,617 us/op
Test.sumOfAngles     2227,220 us/op

Running on a 3.1GHz Sandy Bridge i5 with Java 8u60 x64.

Results on MBP 2015 (3.1GHz Broadwell i7, JDK 8u40 x64)

// latest version
scalarRivenMath  2005.062 ± 122.597  us/op
sumOfAngles      2192.988 ± 404.607  us/op
// my sumOfAngles
scalarRivenMath  2024.594 ± 233.004  us/op
sumOfAngles      2031.853 ± 146.233  us/op

Disclaimer: I saw high variance between runs, which I attribute to CPU thermal throttling. Had to let the laptop cool down a bit before getting consistent results. This might be affecting you too.

The problem seems to be related to less efficient cache use on laptop CPUs. sumOfAngles becomes almost 40% faster with 1/10th the workload (N = 100_000).

Use ThrottleStop. It saved me when my computer was throttling during games. You can even underclock your computer with it to get consistent results.

Hmm, things definitely get more interesting at lower input sizes. That caching-[regression?] is pretty disappointing.

http://pastebin.java-gaming.org/3b2bd5a7b3a16

`
N = 1_000_000

Benchmark Mode Samples Score Score error Units
b.Test.scalarAgentMath avgt 10 5433.443 24.239 us/op
b.Test.scalarJavaMath avgt 10 89490.975 592.976 us/op
b.Test.scalarRivenMath avgt 10 2642.146 60.246 us/op
b.Test.sumOfAngles avgt 10 3321.570 23.815 us/op
b.Test.sumOfAnglesSOA avgt 10 3300.369 46.918 us/op

N = 100_000

Benchmark Mode Samples Score Score error Units
b.Test.scalarAgentMath avgt 10 507.983 4.833 us/op
b.Test.scalarJavaMath avgt 10 8739.558 67.833 us/op
b.Test.scalarRivenMath avgt 10 219.087 4.151 us/op
b.Test.sumOfAngles avgt 10 143.886 10.281 us/op
b.Test.sumOfAnglesSOA avgt 10 70.607 3.719 us/op

N = 10_000

Benchmark Mode Samples Score Score error Units
b.Test.scalarAgentMath avgt 10 51.648 0.908 us/op
b.Test.scalarJavaMath avgt 10 903.191 4.800 us/op
b.Test.scalarRivenMath avgt 10 22.001 0.099 us/op
b.Test.sumOfAngles avgt 10 14.263 0.107 us/op
b.Test.sumOfAnglesSOA avgt 10 4.646 0.019 us/op
`

I got jitwatch working and can confirm the SOA variant is using packed arithmetic.

What does “Score error” mean?

Right. What about precision? What’s the accuracy of those methods? It’d be interesting to see the performance/quality tradeoffs they make.

Output of abs(expected - actual) across whole array, at N = 1_000_000:
(Ignore DoubleXX, all the comp is done in single precision)

`
agent
DoubleSummaryStatistics{count=2000000, sum=33.722636, min=0.000000, average=0.000017, max=0.000083}

sumOfAnglesSOA
DoubleSummaryStatistics{count=2000000, sum=745.938334, min=0.000000, average=0.000373, max=0.001289}

riven
DoubleSummaryStatistics{count=2000000, sum=488.212535, min=0.000000, average=0.000244, max=0.000762}

sumOfAngles
DoubleSummaryStatistics{count=2000000, sum=745.938334, min=0.000000, average=0.000373, max=0.001289}
`

Interesting. I wasn’t expecting mine to beat the SOA. Are you sure you’re not getting an unrealistic amount of compound error?

The real thing to notice is that the sumOfAngles (both variants) were using Riven’s trig. Using Math.sin/cos it has very little error:

`sumOfAngles
DoubleSummaryStatistics{count=20000, sum=0.025140, min=0.000000, average=0.000001, max=0.000004}

sumOfAnglesSOA
DoubleSummaryStatistics{count=20000, sum=0.025140, min=0.000000, average=0.000001, max=0.000004}

agent
DoubleSummaryStatistics{count=20000, sum=0.250961, min=0.000000, average=0.000013, max=0.000083}

riven
DoubleSummaryStatistics{count=20000, sum=5.018816, min=0.000000, average=0.000251, max=0.000762}
`

Speed is basically unaffected, as expected:

`
N = 10_000

Benchmark Mode Samples Score Score error Units
b.Test.scalarAgentMath avgt 10 50.894 1.278 us/op
b.Test.scalarJavaMath avgt 10 876.978 6.797 us/op
b.Test.scalarRivenMath avgt 10 21.657 0.129 us/op
b.Test.sumOfAngles avgt 10 14.239 0.094 us/op
b.Test.sumOfAnglesSOA avgt 10 4.663 0.041 us/op
`

I probably should have remembered to do this earlier.

Aha, that explains it! Yes, this is more in line with what I expected. Thank you.

I found a bug in my implementation. To floor the index, I simply do (int)index. This produces wrong results for negative angles. A correct implementation should use (float)Math.floor(index) instead.

I also found an optimization: [icode]sin1 + (sin2 - sin1) * alpha[/icode] is faster than the original [icode]alpha * sin1 + (1 - alpha) * sin2[/icode] by a solid 10-15%.

My full code:


	private static final int SIN_BITS, SIN_MASK, SIN_COUNT;
	private static final float radToIndex, cosOffset;
	public static final float[] sin;
	
	static {
		SIN_BITS = 9;
		SIN_MASK = ~(-1 << SIN_BITS);
		SIN_COUNT = SIN_MASK + 1;
		
		float radFull = (float) (Math.PI * 2.0);
		radToIndex = SIN_COUNT / radFull;
		
		cosOffset = (float)(Math.PI / 2);
		
		sin = new float[SIN_COUNT+1];
		
		for (int i = 0; i <= SIN_COUNT; i++) {
			sin[i] = (float) Math.sin(Math.PI * 2 * i / SIN_COUNT);
		}
	}
	
	public static final float sin(float rad) {
		float index = rad * radToIndex;
		//float floor = (float)Math.floor(index); //Correct
		float floor = (int)index;	              //Fast, only for positive angles
		
		float alpha = index - floor;
		
		int i = (int)(index) & SIN_MASK;
		
		float sin1 = sin[i+0];
		float sin2 = sin[i+1];
		
		return sin1 + (sin2 - sin1) * alpha;
	}
	
	
	public static final float cos(float rad) {
		return sin(rad + cosOffset);
	}