Extremely Fast sine/cosine

Hey all,

This is a breakout topic from the atan2 one where we started listing a number of alternate sin/cos implementations. To compare them, I setup a simple JMH benchmark and accuracy test. Criticisms and new sin/cos submissions are appreciated.

Recommended implementation with lookup table: Riven’s
Recommended implementation without lookup table: FastCosSin.java posted by kappa

Warning, some of these only function within a margin such as Fast’s implementation, which is intended for values between -3PI and 3PI.

Relevant data

     Current results:
     math_default_sin   : Average Error 0.00000 / Largest Error 0.00000 - Performance 207.574 ns/op
     math_devmaster_sin : Average Error 0.00050 / Largest Error 0.00109 - Performance 232.644 ns/op
     math_fast_sin      : Average Error 0.02996 / Largest Error 0.05601 - Performance   8.812 ns/op
     math_icecore_sin   : Average Error 0.00036 / Largest Error 0.00112 - Performance 126.611 ns/op
     math_riven_sin     : Average Error 0.00060 / Largest Error 0.00224 - Performance   7.054 ns/op
        
     math_default_cos   : Average Error 0.00000 / Largest Error 0.00000 - Performance 206.086 ns/op
     math_devmaster_cos : Average Error 0.00050 / Largest Error 0.00109 - Performance 231.762 ns/op
     math_fast_cos      : Average Error 0.02996 / Largest Error 0.05601 - Performance  11.096 ns/op
     math_icecore_cos   : Average Error 0.00036 / Largest Error 0.00112 - Performance 126.019 ns/op
     math_riven_cos     : Average Error 0.00060 / Largest Error 0.00224 - Performance   7.306 ns/op

The raw data is provided for anyone to interpret for yourselves. The links to the original sources are provided in the comments.

public class SIN {

    // Example values used in the tests:
    //   15 deg = 0.2617993878 rad
    //   105 deg = 1.8325957146 rad
    //   285 deg= 4.9741883682 rad
    public float valueFloatA = 0.2617993878f;
    public float valueFloatB = 1.8325957146f;
    public float valueFloatC = 4.9741883682f;
    public double valueDoubleA = 0.2617993878;
    public double valueDoubleB = 1.8325957146;
    public double valueDoubleC = 4.9741883682;
    ///////////////////////////////////////
    // Default sin
    ///////////////////////////////////////

    @Benchmark
    public double math_default_sin() {
        return Math.sin(valueDoubleA) + Math.sin(valueDoubleB) + Math.sin(valueDoubleC);
    }

    @Benchmark
    public double math_default_cos() {
        return Math.cos(valueDoubleA) + Math.cos(valueDoubleB) + Math.cos(valueDoubleC);
    }

    ///////////////////////////////////////
    // FastCosSin.java posted by kappa  ( http://www.java-gaming.org/topics/extremely-fast-atan2/36467/msg/346117/view.html#msg346117 )
    ///////////////////////////////////////

    public static final class Fast {

        private static final float PI = 3.1415927f;
        private static final float MINUS_PI = -PI;
        private static final float DOUBLE_PI = PI * 2f;
        private static final float PI_2 = PI / 2f;

        private static final float CONST_1 = 4f / PI;
        private static final float CONST_2 = 4f / (PI * PI);

        public static final float sin(float x) {
            if (x < MINUS_PI) {
                x += DOUBLE_PI;
            } else if (x > PI) {
                x -= DOUBLE_PI;
            }

            return (x < 0f) ? (CONST_1 * x + CONST_2 * x * x)
                    : (CONST_1 * x - CONST_2 * x * x);
        }

        public static final float cos(float x) {
            if (x < MINUS_PI) {
                x += DOUBLE_PI;
            } else if (x > PI) {
                x -= DOUBLE_PI;
            }

            x += PI_2;

            if (x > PI) {
                x -= DOUBLE_PI;
            }

            return (x < 0f) ? (CONST_1 * x + CONST_2 * x * x)
                    : (CONST_1 * x - CONST_2 * x * x);
        }
    }

    @Benchmark
    public double math_fast_sin() {
        return Fast.sin(valueFloatA) + Fast.sin(valueFloatB) + Fast.sin(valueFloatC);
    }

    @Benchmark
    public double math_fast_cos() {
        return Fast.cos(valueFloatA) + Fast.cos(valueFloatB) + Fast.cos(valueFloatC);
    }

    ///////////////////////////////////////
    // Devmaster's sine/cosine ( http://forum.devmaster.net/t/fast-and-accurate-sine-cosine/9648 )
    ///////////////////////////////////////

    public static final class Devmaster {

        public static final float PI = 3.1415927f;
        public static final float PI_2 = PI / 2f;
        public static final float DOUBLE_PI = PI * 2f;
        public static final float B = 4 / PI;
        public static final float C = -4 / (PI * PI);
        public static final float P = 0.225f;

        public static final float sin(float x) {
            float x1 = x % PI;
            float x2 = x % DOUBLE_PI;

            if (x > 0) {
                float y = x1 * (B + C * x1);
                y = (y > 0) ? (y = P * (y * y - y) + y)
                        : (y = P * (-y * y - y) + y);
                float xp = x2 - DOUBLE_PI;
                if (!(xp < 0 && xp < -PI)) {
                    y = -y;
                }
                return y;
            } else {
                float y = x1 * (B - C * x1);
                y = (y > 0) ? (y = P * (y * y - y) + y)
                        : (y = P * (-y * y - y) + y);
                float xp = x2 + DOUBLE_PI;
                if (xp > 0 && xp < PI) {
                    y = -y;
                }
                return y;
            }
        }

        public static final float cos(float x) {
            float x0 = x + PI_2;
            float x1 = x0 % PI;
            float x2 = x0 % DOUBLE_PI;

            if (x0 > 0) {
                float y = x1 * (B + C * x1);
                y = (y > 0) ? (y = P * (y * y - y) + y)
                        : (y = P * (-y * y - y) + y);
                float xp = x2 - DOUBLE_PI;
                if (!(xp < 0 && xp < -PI)) {
                    y = -y;
                }
                return y;
            } else {
                float y = x1 * (B - C * x1);
                y = (y > 0) ? (y = P * (y * y - y) + y)
                        : (y = P * (-y * y - y) + y);
                float xp = x2 + DOUBLE_PI;
                if (xp > 0 && xp < PI) {
                    y = -y;
                }
                return y;
            }
        }
    }

    @Benchmark
    public double math_devmaster_sin() {
        return Devmaster.sin(valueFloatA) + Devmaster.sin(valueFloatB) + Devmaster.sin(valueFloatC);
    }

    @Benchmark
    public double math_devmaster_cos() {
        return Devmaster.cos(valueFloatA) + Devmaster.cos(valueFloatB) + Devmaster.cos(valueFloatC);
    }

    ///////////////////////////////////////
    // Riven's sine/cosine ( http://www.java-gaming.org/topics/fast-math-sin-cos-lookup-tables/24191/view.html )
    ///////////////////////////////////////

    public static final class Riven {

        private static final int SIN_BITS, SIN_MASK, SIN_COUNT;
        private static final float radFull, radToIndex;
        private static final float degFull, degToIndex;
        private static final float[] sin, cos;

        static {
            SIN_BITS = 12;
            SIN_MASK = ~(-1 << SIN_BITS);
            SIN_COUNT = SIN_MASK + 1;

            radFull = (float) (Math.PI * 2.0);
            degFull = (float) (360.0);
            radToIndex = SIN_COUNT / radFull;
            degToIndex = SIN_COUNT / degFull;

            sin = new float[SIN_COUNT];
            cos = new float[SIN_COUNT];

            for (int i = 0; i < SIN_COUNT; i++) {
                sin[i] = (float) Math.sin((i + 0.5f) / SIN_COUNT * radFull);
                cos[i] = (float) Math.cos((i + 0.5f) / SIN_COUNT * radFull);
            }

            // Four cardinal directions (credits: Nate)
            for (int i = 0; i < 360; i += 90) {
                sin[(int) (i * degToIndex) & SIN_MASK] = (float) Math.sin(i * Math.PI / 180.0);
                cos[(int) (i * degToIndex) & SIN_MASK] = (float) Math.cos(i * Math.PI / 180.0);
            }
        }

        public static final float sin(float rad) {
            return sin[(int) (rad * radToIndex) & SIN_MASK];
        }

        public static final float cos(float rad) {
            return cos[(int) (rad * radToIndex) & SIN_MASK];
        }
    }

    @Benchmark
    public double math_riven_sin() {
        return Riven.sin(valueFloatA) + Riven.sin(valueFloatB) + Riven.sin(valueFloatC);
    }

    @Benchmark
    public double math_riven_cos() {
        return Riven.cos(valueFloatA) + Riven.cos(valueFloatB) + Riven.cos(valueFloatC);
    }

    ///////////////////////////////////////
    // Icecore's sine/cosine ( http://www.java-gaming.org/topics/extremely-fast-sine-cosine/36469/msg/346190/view.html#msg346190 )
    ///////////////////////////////////////

    public static final class Icecore {

        private static final int Size_SC_Ac = 5000;
        private static final int Size_SC_Ar = Size_SC_Ac + 1;
        private static final float Sin[] = new float[Size_SC_Ar];
        private static final float Cos[] = new float[Size_SC_Ar];
        private static final float Pi = (float) Math.PI;
        private static final float Pi_D = Pi * 2;
        private static final float Pi_SC_D = Pi_D / Size_SC_Ac;

        static {
            for (int i = 0; i < Size_SC_Ar; i++) {
                double d = i * Pi_SC_D;
                Sin[i] = (float) Math.sin(d);
                Cos[i] = (float) Math.cos(d);
            }
        }

        public static final float sin(float r) {
            float rp = r % Pi_D;
            if (rp < 0) {
                rp += Pi_D;
            }
            return Sin[(int) (rp / Pi_SC_D)];
        }

        public static final float cos(float r) {
            float rp = r % Pi_D;
            if (rp < 0) {
                rp += Pi_D;
            }
            return Cos[(int) (rp / Pi_SC_D)];
        }
    }

    @Benchmark
    public double math_icecore_sin() {
        return Icecore.sin(valueFloatA) + Icecore.sin(valueFloatB) + Icecore.sin(valueFloatC);
    }

    @Benchmark
    public double math_icecore_cos() {
        return Icecore.cos(valueFloatA) + Icecore.cos(valueFloatB) + Icecore.cos(valueFloatC);
    }
}

Accuracy

    ///////////////////////////////////////
    // Accuracy
    ///////////////////////////////////////

    public static void main(String[] args) {
        int range = 180;
        double[] totalCos = new double[5];
        double[] totalSin = new double[5];
        double[] largestErrorCos = new double[5];
        double[] largestErrorSin = new double[5];
        float conversion = (float) (Math.PI / 180.0f);
        int count = 0;
        for (int x = -range; x <= range; x++) {
            float value = conversion * x;
            double result;

            // Cos

            result = Math.abs(Math.cos(value) - Math.cos(value));
            totalCos[0] += result;
            largestErrorCos[0] = result > largestErrorSin[0] ? result : largestErrorCos[0];

            result = Math.abs(Math.cos(value) - Devmaster.cos(value));
            totalCos[1] += result;
            largestErrorCos[1] = result > largestErrorCos[1] ? result : largestErrorCos[1];

            result = Math.abs(Math.cos(value) - Fast.cos(value));
            totalCos[2] += result;
            largestErrorCos[2] = result > largestErrorCos[2] ? result : largestErrorCos[2];

            result = Math.abs(Math.cos(value) - Riven.cos(value));
            totalCos[3] += result;
            largestErrorCos[3] = result > largestErrorCos[3] ? result : largestErrorCos[3];

            result = Math.abs(Math.cos(value) - Icecore.cos(value));
            totalCos[4] += result;
            largestErrorCos[4] = result > largestErrorCos[4] ? result : largestErrorCos[4];

            // Sin

            result = Math.abs(Math.sin(value) - Math.sin(value));
            totalSin[0] += result;
            largestErrorSin[0] = result > largestErrorSin[0] ? result : largestErrorSin[0];

            result = Math.abs(Math.sin(value) - Devmaster.sin(value));
            totalSin[1] += result;
            largestErrorSin[1] = result > largestErrorSin[1] ? result : largestErrorSin[1];

            result = Math.abs(Math.sin(value) - Fast.sin(value));
            totalSin[2] += result;
            largestErrorSin[2] = result > largestErrorSin[2] ? result : largestErrorSin[2];

            result = Math.abs(Math.sin(value) - Riven.sin(value));
            totalSin[3] += result;
            largestErrorSin[3] = result > largestErrorSin[3] ? result : largestErrorSin[3];

            result = Math.abs(Math.sin(value) - Icecore.sin(value));
            totalSin[4] += result;
            largestErrorSin[4] = result > largestErrorSin[4] ? result : largestErrorSin[4];

            count++;
        }
        System.out.println(String.format("A lower average means higher accuracy. Results over %,d samples.", count));
        // Cos
        System.out.println(String.format("math_default_cos   : Average Error %.5f / Largest Error %.5f", totalCos[0] / count, largestErrorCos[0]));
        System.out.println(String.format("math_devmaster_cos : Average Error %.5f / Largest Error %.5f", totalCos[1] / count, largestErrorCos[1]));
        System.out.println(String.format("math_fast_cos      : Average Error %.5f / Largest Error %.5f", totalCos[2] / count, largestErrorCos[2]));
        System.out.println(String.format("math_icecore_cos   : Average Error %.5f / Largest Error %.5f", totalCos[4] / count, largestErrorCos[4]));
        System.out.println(String.format("math_riven_cos     : Average Error %.5f / Largest Error %.5f", totalCos[3] / count, largestErrorCos[3]));
        // Sin
        System.out.println(String.format("math_default_sin   : Average Error %.5f / Largest Error %.5f", totalSin[0] / count, largestErrorSin[0]));
        System.out.println(String.format("math_devmaster_sin : Average Error %.5f / Largest Error %.5f", totalSin[1] / count, largestErrorSin[1]));
        System.out.println(String.format("math_fast_sin      : Average Error %.5f / Largest Error %.5f", totalSin[2] / count, largestErrorSin[2]));
        System.out.println(String.format("math_icecore_sin   : Average Error %.5f / Largest Error %.5f", totalSin[4] / count, largestErrorSin[4]));
        System.out.println(String.format("math_riven_sin     : Average Error %.5f / Largest Error %.5f", totalSin[3] / count, largestErrorSin[3]));
        /*
         Range of 180
         A lower average means higher accuracy. Results over 361 samples.
         math_default_cos   : Average Error 0.00000 / Largest Error 0.00000
         math_devmaster_cos : Average Error 0.00050 / Largest Error 0.00109
         math_fast_cos      : Average Error 0.02996 / Largest Error 0.05601
         math_icecore_cos   : Average Error 0.00036 / Largest Error 0.00112
         math_riven_cos     : Average Error 0.00060 / Largest Error 0.00224
        
         math_default_sin   : Average Error 0.00000 / Largest Error 0.00000
         math_devmaster_sin : Average Error 0.00050 / Largest Error 0.00109
         math_fast_sin      : Average Error 0.02996 / Largest Error 0.05601
         math_icecore_sin   : Average Error 0.00036 / Largest Error 0.00112
         math_riven_sin     : Average Error 0.00060 / Largest Error 0.00224
        
         Range of 10000
         A lower average means higher accuracy. Results over 20,001 samples.
         math_default_cos   : Average Error 0.00000 / Largest Error 0.00000
         math_devmaster_cos : Average Error 0.00051 / Largest Error 0.00110
         math_fast_cos      : Average Error 3385.79416 / Largest Error 11047.17010 !!! Test goes outside of intended range
         math_icecore_cos   : Average Error 0.00040 / Largest Error 0.00126
         math_riven_cos     : Average Error 0.00061 / Largest Error 0.00230
        
         math_default_sin   : Average Error 0.00000 / Largest Error 0.00000
         math_devmaster_sin : Average Error 0.00050 / Largest Error 0.00110
         math_fast_sin      : Average Error 3583.72491 / Largest Error 11257.58258 !!! Test goes outside of intended range
         math_icecore_sin   : Average Error 0.00039 / Largest Error 0.00126
         math_riven_sin     : Average Error 0.00061 / Largest Error 0.00230
         */
    }

Benchmark results

# JMH 1.9.3 (released 73 days ago)
# VM invoker: C:\Program Files\Java\jdk1.8.0_45\jre\bin\java.exe
# VM options: <none>
# Warmup: 5 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: com.gmail.mooman219.benchmark_test.SIN.math_default_cos

# Run progress: 0.00% complete, ETA 00:01:40
# Fork: 1 of 1
# Warmup Iteration   1: 210.988 ns/op
# Warmup Iteration   2: 205.902 ns/op
# Warmup Iteration   3: 207.415 ns/op
# Warmup Iteration   4: 207.827 ns/op
# Warmup Iteration   5: 204.809 ns/op
Iteration   1: 204.567 ns/op
Iteration   2: 204.636 ns/op
Iteration   3: 209.536 ns/op
Iteration   4: 206.779 ns/op
Iteration   5: 204.910 ns/op


Result "math_default_cos":
  206.086 ñ(99.9%) 8.208 ns/op [Average]
  (min, avg, max) = (204.567, 206.086, 209.536), stdev = 2.132
  CI (99.9%): [197.878, 214.293] (assumes normal distribution)


# JMH 1.9.3 (released 73 days ago)
# VM invoker: C:\Program Files\Java\jdk1.8.0_45\jre\bin\java.exe
# VM options: <none>
# Warmup: 5 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: com.gmail.mooman219.benchmark_test.SIN.math_default_sin

# Run progress: 10.00% complete, ETA 00:01:33
# Fork: 1 of 1
# Warmup Iteration   1: 207.585 ns/op
# Warmup Iteration   2: 206.643 ns/op
# Warmup Iteration   3: 212.465 ns/op
# Warmup Iteration   4: 209.597 ns/op
# Warmup Iteration   5: 207.192 ns/op
Iteration   1: 206.839 ns/op
Iteration   2: 206.776 ns/op
Iteration   3: 209.633 ns/op
Iteration   4: 206.813 ns/op
Iteration   5: 207.809 ns/op


Result "math_default_sin":
  207.574 ñ(99.9%) 4.736 ns/op [Average]
  (min, avg, max) = (206.776, 207.574, 209.633), stdev = 1.230
  CI (99.9%): [202.838, 212.310] (assumes normal distribution)


# JMH 1.9.3 (released 73 days ago)
# VM invoker: C:\Program Files\Java\jdk1.8.0_45\jre\bin\java.exe
# VM options: <none>
# Warmup: 5 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: com.gmail.mooman219.benchmark_test.SIN.math_devmaster_cos

# Run progress: 20.00% complete, ETA 00:01:23
# Fork: 1 of 1
# Warmup Iteration   1: 231.947 ns/op
# Warmup Iteration   2: 232.759 ns/op
# Warmup Iteration   3: 233.528 ns/op
# Warmup Iteration   4: 232.575 ns/op
# Warmup Iteration   5: 230.663 ns/op
Iteration   1: 231.842 ns/op
Iteration   2: 232.178 ns/op
Iteration   3: 233.663 ns/op
Iteration   4: 230.563 ns/op
Iteration   5: 230.563 ns/op


Result "math_devmaster_cos":
  231.762 ñ(99.9%) 4.972 ns/op [Average]
  (min, avg, max) = (230.563, 231.762, 233.663), stdev = 1.291
  CI (99.9%): [226.790, 236.734] (assumes normal distribution)


# JMH 1.9.3 (released 73 days ago)
# VM invoker: C:\Program Files\Java\jdk1.8.0_45\jre\bin\java.exe
# VM options: <none>
# Warmup: 5 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: com.gmail.mooman219.benchmark_test.SIN.math_devmaster_sin

# Run progress: 30.00% complete, ETA 00:01:12
# Fork: 1 of 1
# Warmup Iteration   1: 230.069 ns/op
# Warmup Iteration   2: 229.269 ns/op
# Warmup Iteration   3: 234.922 ns/op
# Warmup Iteration   4: 230.136 ns/op
# Warmup Iteration   5: 232.788 ns/op
Iteration   1: 230.068 ns/op
Iteration   2: 230.098 ns/op
Iteration   3: 235.049 ns/op
Iteration   4: 237.674 ns/op
Iteration   5: 230.333 ns/op


Result "math_devmaster_sin":
  232.644 ñ(99.9%) 13.552 ns/op [Average]
  (min, avg, max) = (230.068, 232.644, 237.674), stdev = 3.519
  CI (99.9%): [219.093, 246.196] (assumes normal distribution)


# JMH 1.9.3 (released 73 days ago)
# VM invoker: C:\Program Files\Java\jdk1.8.0_45\jre\bin\java.exe
# VM options: <none>
# Warmup: 5 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: com.gmail.mooman219.benchmark_test.SIN.math_fast_cos

# Run progress: 40.00% complete, ETA 00:01:02
# Fork: 1 of 1
# Warmup Iteration   1: 11.612 ns/op
# Warmup Iteration   2: 11.429 ns/op
# Warmup Iteration   3: 11.423 ns/op
# Warmup Iteration   4: 10.993 ns/op
# Warmup Iteration   5: 11.219 ns/op
Iteration   1: 10.976 ns/op
Iteration   2: 10.966 ns/op
Iteration   3: 11.295 ns/op
Iteration   4: 11.272 ns/op
Iteration   5: 10.972 ns/op


Result "math_fast_cos":
  11.096 ñ(99.9%) 0.658 ns/op [Average]
  (min, avg, max) = (10.966, 11.096, 11.295), stdev = 0.171
  CI (99.9%): [10.438, 11.755] (assumes normal distribution)


# JMH 1.9.3 (released 73 days ago)
# VM invoker: C:\Program Files\Java\jdk1.8.0_45\jre\bin\java.exe
# VM options: <none>
# Warmup: 5 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: com.gmail.mooman219.benchmark_test.SIN.math_fast_sin

# Run progress: 50.00% complete, ETA 00:00:51
# Fork: 1 of 1
# Warmup Iteration   1: 9.646 ns/op
# Warmup Iteration   2: 9.421 ns/op
# Warmup Iteration   3: 8.971 ns/op
# Warmup Iteration   4: 8.905 ns/op
# Warmup Iteration   5: 8.778 ns/op
Iteration   1: 8.794 ns/op
Iteration   2: 8.839 ns/op
Iteration   3: 9.057 ns/op
Iteration   4: 8.685 ns/op
Iteration   5: 8.686 ns/op


Result "math_fast_sin":
  8.812 ñ(99.9%) 0.587 ns/op [Average]
  (min, avg, max) = (8.685, 8.812, 9.057), stdev = 0.153
  CI (99.9%): [8.225, 9.399] (assumes normal distribution)


# JMH 1.9.3 (released 73 days ago)
# VM invoker: C:\Program Files\Java\jdk1.8.0_45\jre\bin\java.exe
# VM options: <none>
# Warmup: 5 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: com.gmail.mooman219.benchmark_test.SIN.math_icecore_cos

# Run progress: 60.00% complete, ETA 00:00:41
# Fork: 1 of 1
# Warmup Iteration   1: 126.369 ns/op
# Warmup Iteration   2: 124.888 ns/op
# Warmup Iteration   3: 127.466 ns/op
# Warmup Iteration   4: 125.693 ns/op
# Warmup Iteration   5: 126.229 ns/op
Iteration   1: 125.641 ns/op
Iteration   2: 125.819 ns/op
Iteration   3: 127.509 ns/op
Iteration   4: 125.537 ns/op
Iteration   5: 125.587 ns/op


Result "math_icecore_cos":
  126.019 ñ(99.9%) 3.234 ns/op [Average]
  (min, avg, max) = (125.537, 126.019, 127.509), stdev = 0.840
  CI (99.9%): [122.785, 129.253] (assumes normal distribution)


# JMH 1.9.3 (released 73 days ago)
# VM invoker: C:\Program Files\Java\jdk1.8.0_45\jre\bin\java.exe
# VM options: <none>
# Warmup: 5 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: com.gmail.mooman219.benchmark_test.SIN.math_icecore_sin

# Run progress: 70.00% complete, ETA 00:00:31
# Fork: 1 of 1
# Warmup Iteration   1: 126.271 ns/op
# Warmup Iteration   2: 124.888 ns/op
# Warmup Iteration   3: 127.539 ns/op
# Warmup Iteration   4: 125.718 ns/op
# Warmup Iteration   5: 125.784 ns/op
Iteration   1: 125.551 ns/op
Iteration   2: 126.679 ns/op
Iteration   3: 127.568 ns/op
Iteration   4: 127.394 ns/op
Iteration   5: 125.862 ns/op


Result "math_icecore_sin":
  126.611 ñ(99.9%) 3.454 ns/op [Average]
  (min, avg, max) = (125.551, 126.611, 127.568), stdev = 0.897
  CI (99.9%): [123.157, 130.065] (assumes normal distribution)


# JMH 1.9.3 (released 73 days ago)
# VM invoker: C:\Program Files\Java\jdk1.8.0_45\jre\bin\java.exe
# VM options: <none>
# Warmup: 5 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: com.gmail.mooman219.benchmark_test.SIN.math_riven_cos

# Run progress: 80.00% complete, ETA 00:00:20
# Fork: 1 of 1
# Warmup Iteration   1: 7.710 ns/op
# Warmup Iteration   2: 7.779 ns/op
# Warmup Iteration   3: 7.438 ns/op
# Warmup Iteration   4: 7.180 ns/op
# Warmup Iteration   5: 7.324 ns/op
Iteration   1: 7.320 ns/op
Iteration   2: 7.315 ns/op
Iteration   3: 7.411 ns/op
Iteration   4: 7.172 ns/op
Iteration   5: 7.314 ns/op


Result "math_riven_cos":
  7.306 ñ(99.9%) 0.330 ns/op [Average]
  (min, avg, max) = (7.172, 7.306, 7.411), stdev = 0.086
  CI (99.9%): [6.976, 7.636] (assumes normal distribution)


# JMH 1.9.3 (released 73 days ago)
# VM invoker: C:\Program Files\Java\jdk1.8.0_45\jre\bin\java.exe
# VM options: <none>
# Warmup: 5 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: com.gmail.mooman219.benchmark_test.SIN.math_riven_sin

# Run progress: 90.00% complete, ETA 00:00:10
# Fork: 1 of 1
# Warmup Iteration   1: 7.652 ns/op
# Warmup Iteration   2: 7.808 ns/op
# Warmup Iteration   3: 7.112 ns/op
# Warmup Iteration   4: 7.011 ns/op
# Warmup Iteration   5: 7.024 ns/op
Iteration   1: 7.121 ns/op
Iteration   2: 7.016 ns/op
Iteration   3: 7.112 ns/op
Iteration   4: 7.000 ns/op
Iteration   5: 7.021 ns/op


Result "math_riven_sin":
  7.054 ñ(99.9%) 0.223 ns/op [Average]
  (min, avg, max) = (7.000, 7.054, 7.121), stdev = 0.058
  CI (99.9%): [6.831, 7.276] (assumes normal distribution)


# Run complete. Total time: 00:01:43

Benchmark               Mode  Cnt    Score    Error  Units
SIN.math_default_cos    avgt    5  206.086 ñ  8.208  ns/op
SIN.math_default_sin    avgt    5  207.574 ñ  4.736  ns/op
SIN.math_devmaster_cos  avgt    5  231.762 ñ  4.972  ns/op
SIN.math_devmaster_sin  avgt    5  232.644 ñ 13.552  ns/op
SIN.math_fast_cos       avgt    5   11.096 ñ  0.658  ns/op
SIN.math_fast_sin       avgt    5    8.812 ñ  0.587  ns/op
SIN.math_icecore_cos    avgt    5  126.019 ñ  3.234  ns/op
SIN.math_icecore_sin    avgt    5  126.611 ñ  3.454  ns/op
SIN.math_riven_cos      avgt    5    7.306 ñ  0.330  ns/op
SIN.math_riven_sin      avgt    5    7.054 ñ  0.223  ns/op

Keep in mind these are micro benchmarks and not indicative of real world performance; the JMH can only do so much to make sure the JIT doesn’t interfere. There may be a fair amount of branch prediction and caching that’s going on. For example, the lookup table in Riven’s implementation might just be sitting in a cpu cache.

(referring to kappa’s implementation. Keep in mind I could’ve missed something rather important)


    if (x < MINUS_PI) {
        x += DOUBLE_PI;
    } else if (x > PI) {
        x -= DOUBLE_PI;
    }

    x += PI_2;

    if (x > PI) {
        x -= DOUBLE_PI;
    }

Should probably be replaced by


    x += PI_2;

    x = (x % DOUBLE_PI) + PI;
    if (x > PI) {
        x -= DOUBLE_PI;
    }

For 1) simplicity, and 2) so it works properly on numbers outside the range of -3PI to 3PI. Unless the JVM is doing some insane stuff under the hood it shouldn’t really impact performance. And if it is, you may as well remove some of the if checks in the original code and just assume the values passed in were within ±PI as specified.

The sine function can be changed similarly:


    x = (x % DOUBLE_PI) + PI;
    if (x > PI) {
        x -= DOUBLE_PI;
    }

HeroGraveDev thanks for the modification, it does clean up the code however after benchmarking it seems that the modulas ‘%’ operator is pretty slow. My benchmark result after warming and running 100,000,000 invocations of both sin/cos are as follows:


Default      :	22383602954ns (fifth)
Devmaster	 :	106982ns (second)
Fast         :	115546ns (third)
Riven        :	103070ns (first)
Hero         :	6229776220ns (fourth)

If the performance is FIVE orders of magnitude off, the odds are pretty good that HotSpot basically optimized it into a constant or something similar, tuned to the benchmark. Factor 10 would be beyond expectations, let alone factor 100,000. Back to the drawing board :point:

True. I tried now everything to prevent constant folding, and as I said earier on the other thread, devmaster’s sine is about 2x faster than Java’s. It doesn’t get any better. Of course, it will be better if you do something like this: sine(1.3f) but not Math.sin(1.3f), because the code for Math.sin() cannot be constant-folded and inlined.
But in the end the 2x faster is not worth the dramatic loss in precision for me.

EDIT: You must also make sure that the jit is not completely eliminating the invocation of any custom sine function altogether, when being invoked in a loop. It did that for me when I just had a loop over some many million invocations and the jit completely erased the call to sine(), because it neither changed global program state nor was the result of that call being used anywhere else. That resulted in the same runtime no matter how many loop invocations there were.
So I changed that invocation to increment an accumulator variable with the invocation result. And those changes then gave me 2x faster for devmaster compared to Java’s Math.sin().

There’s a lot you have to take into account to prevent JIT and CPU magic. I’ve seen a number of benchmarks on this site that didn’t take into consideration these problems, skewing results. It pushed me to make an account and post the atan2 comparison so that there would be some solid data on the popular function.

For the original post, JMH is preventing dead code and loop optimization, the multiple values help prevent branch prediction, and the use of public variables as parameters prevents the JIT from caching the results. The lookup tables can still be sitting in the cache however, but that’s pretty hard to avoid.

 
    // Accuracy
    public static void main(String[] args) {
        int range = 90;

I think range must be bigger like 9000;
For any unnatural values, who knows when function can return critical error)

Well, devmaster’s critically errors past 180, but it’s also only designed to function between -180 and 180. I’ll up it to 180 and also provide a second one with up to 9000.

then we need 2 performance lists.

I think i fix it )

(i think ± can be easy align somewhere in formula but i don’t see it)


    static final private float Pi_D = Pi * 2;
    private static float devmaster_sin(float x){
    	float x1 = x % Pi;
    	float x2 = x % Pi_D;
    		
        if(x > 0){
        	float y = x1 * (B + C * x1);
        	y = (y > 0) ? (y = P * (y * y - y) + y)
        			    : (y = P * (-y * y - y) + y);
        	float xp = x2 - Pi_D;
        	if(!(xp < 0 && xp < -Pi)){
        		y = -y;
        	}
            return y;
        }
        else{
        	float y = x1 * (B - C * x1);
        	y = (y > 0) ? (y = P * (y * y - y) + y)
    			        : (y = P * (-y * y - y) + y);
        	float xp = x2 + Pi_D;
        	if(xp > 0 && xp < Pi){
        		y = -y;
        	}
            return y;
        }
    }

    private static float devmaster_cos(float x) {
    	float x0 = x + PI_2;
    	float x1 = x0 % Pi;
    	float x2 = x0 % Pi_D;
    		
        if(x0 > 0){
        	float y = x1 * (B + C * x1);
        	y = (y > 0) ? (y = P * (y * y - y) + y)
        			    : (y = P * (-y * y - y) + y);
        	float xp = x2 - Pi_D;
        	if(!(xp < 0 && xp < -Pi)){
        		y = -y;
        	}
            return y;
        }
        else{
        	float y = x1 * (B - C * x1);
        	y = (y > 0) ? (y = P * (y * y - y) + y)
    			        : (y = P * (-y * y - y) + y);
        	float xp = x2 + Pi_D;
        	if(xp > 0 && xp < Pi){
        		y = -y;
        	}
            return y;
        }
    }

A lower average means higher accuracy. Results over 18 001 samples.
math_devmaster_sin : Average Error 0,00050 / Largest Error 0,00109
math_devmaster_cos : Average Error 0,00050 / Largest Error 0,00110

this is my lookup table)

(Riven’s looks fastest)


	static final private int Size_SC_Ac = 5000;
	static final private int Size_SC_Ar = Size_SC_Ac + 1;
	static final private float Sin[] = new float[Size_SC_Ar];
	static final private float Cos[] = new float[Size_SC_Ar];
	static final private float Pi = (float)Math.PI;
	static final private float Pi_D = Pi * 2;
	static final private float Pi_SC_D = Pi_D / Size_SC_Ac;
	static{
		for(int i = 0; i < Size_SC_Ar; i++){
			double d = i * Pi_SC_D;
			Sin[i] = (float)Math.sin(d);
			Cos[i] = (float)Math.cos(d);
		}
	}
	
	static final public float sin(float r){
		float rp = r % Pi_D;
		if(rp < 0){
			rp += Pi_D;
		}
		return Sin[(int)(rp / Pi_SC_D)];
	}
	
	static final public float cos(float r){
		float rp = r % Pi_D;
		if(rp < 0){
			rp += Pi_D;
		}
		return Cos[(int)(rp / Pi_SC_D)];
	}

Listening is Believing

I use a LUT for sine waves for my FM synthesizer. So, I had to give Riven’s lookup method a try, actually playing a wave. His sounds just as good. If you’d like to listen: some code where I compare methods follows. (Haven’t tried it with a “full stack” of 3 modulators + carrier yet. I’m guessing it will still sound good, though. Also, I want to understand the method he is using for interpolating between table values. I just use linear interpolation.)

In the code, I show a half dozen runs where all the values needed for 10-seconds worth of A-440 are calculated, comparing his and my own lookup. (The first comparison can be thrown out because it is mostly about the Audio code getting loaded into memory.) I think his method is even better than the comparison indicates, because I have to manage the cursor to make sure it doesn’t look up a value outside the LUT, and his doesn’t actually require this. I’m assuming the radian value inputs will remain accurate all the way up to Float.MAX_VALUE. Is this true? It takes a long time to get there playing an A=440 at 44100 fps.

Yay! Looks like this discussion will result in a significantly faster FM synthesis processing algorithm for me to use. ;D
[EDIT: removed a duplicate line of code that resulted in the results being 880 instead of 440. Also, it looks like what I have is already pretty well optimized for my usage. Main difference: I’m using a float with a range of [0…tableSize) rather than [0…2PI) so no need to multiply by radFull, and I’m managing the range via if (cursor > tableSize) cursor -= tableSize; so that eliminates the need for the bit mask operation. I am also seeing that my get(int) method is off in phase by a tiny amount.]

package fastSineTest;

import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.SourceDataLine;
import javax.sound.sampled.Line.Info;

public class ListeningIsBelieveing {

	static int bufferSize = 4 * 1024 * 4;
	
	static long timeDuration;
	
	public static void main(String[] args) throws LineUnavailableException, 
		InterruptedException 
	{			
		new RivenSine();
		new PFSine();
				
		float pfIncr, rivenIncr;
		
		rivenIncr = (float)((440 * 2 * Math.PI) / 44100f);
		pfIncr = (440 * PFSine.getOperationalSize()) / 44100f;

		float rivenTblSize = (float) (Math.PI * 2);
		float pfTblSize = PFSine.getOperationalSize();
		
		// metrics tests
		for (int i = 0; i < 6; i++)
		{
			play10Seconds(rivenIncr, rivenTblSize, false, true);
			System.out.println("Ri10SecComputationTime(nano): " + timeDuration);
			play10Seconds(pfIncr, pfTblSize, true, true);
			System.out.println("PF10SecComputationTime(nano): " + timeDuration);
			System.out.println();
		}

		// actual playback occurs here
		System.out.println("Riven's is playing...");
		play10Seconds(rivenIncr, rivenTblSize, false, false);
		System.out.println("Phil's is playing...");
		play10Seconds(pfIncr, pfTblSize, true, false);
		System.out.println("Done");
	}

	private static void play10Seconds(float incr, float tblSize, 
			boolean method, boolean timeIt) 
			throws LineUnavailableException
	{
		AudioFormat audioFmt = new AudioFormat(
				AudioFormat.Encoding.PCM_SIGNED, 
				44100, 16, 2, 4, 44100, false);
		
		Info info = new DataLine.Info(SourceDataLine.class, 
				audioFmt);
		
		SourceDataLine sdl = (SourceDataLine)AudioSystem.getLine(info);

		byte[] outBuffer = new byte[bufferSize];  
		
		sdl.open(audioFmt, bufferSize);
		sdl.start();
				
		int idx = 0;
		float cursor = 0;
		float normalizedVal = 0;
		int audioVal = 0;
		
		long startTime = System.nanoTime();
		
		for (int ii = 0; ii < 441_000; ii++)   // 10 seconds at 44100 fps
		{
			cursor += incr;
			if (cursor >= tblSize) cursor -= tblSize;
			normalizedVal = getAudioData(cursor, method);
			
			audioVal = (int)(normalizedVal * 32767);
		
			outBuffer[idx++] = (byte)audioVal;
			outBuffer[idx++] = (byte)(audioVal >> 8);			
			outBuffer[idx++] = outBuffer[idx - 2];
			outBuffer[idx++] = outBuffer[idx - 2];
			if (idx >= bufferSize)
			{	
				if (!timeIt) sdl.write(outBuffer, 0, bufferSize);
				idx = 0;
			}
			
		// [EDIT:	[s]cursor += incr;[/s]  oops! ]
			
		}
		if (!timeIt && (idx > 0)) sdl.write(outBuffer,  0, idx);
		
		timeDuration = System.nanoTime() - startTime;
		
		sdl.drain();
		sdl.close();
		sdl = null;
		
	}
	
	static float getAudioData(float cursor, boolean method)
	{
		if (method)
		{
			return PFSine.get(cursor);
		}
		else
		{
			return RivenSine.sin(cursor);
		}
	}
}

package fastSineTest;

public class RivenSine {
	  public static final float sin(float rad)
	   {
	      return sin[(int) (rad * radToIndex) & SIN_MASK];
	   }

	   public static final float cos(float rad)
	   {
	      return cos[(int) (rad * radToIndex) & SIN_MASK];
	   }

	   public static final float sinDeg(float deg)
	   {
	      return sin[(int) (deg * degToIndex) & SIN_MASK];
	   }

	   public static final float cosDeg(float deg)
	   {
	      return cos[(int) (deg * degToIndex) & SIN_MASK];
	   }

	   @SuppressWarnings("unused")
	   private static final float   RAD,DEG;
	   private static final int     SIN_BITS,SIN_MASK,SIN_COUNT;
	   private static final float   radFull,radToIndex;
	   private static final float   degFull,degToIndex;
//	   private static final float[] sin, cos;
	   public static final float[] sin, cos;

	   static
	   {
	      RAD = (float) Math.PI / 180.0f;
	      DEG = 180.0f / (float) Math.PI;

	      SIN_BITS  = 12;
	      SIN_MASK  = ~(-1 << SIN_BITS);
	      SIN_COUNT = SIN_MASK + 1;

	      radFull    = (float) (Math.PI * 2.0);
	      degFull    = (float) (360.0);
	      radToIndex = SIN_COUNT / radFull;
	      degToIndex = SIN_COUNT / degFull;

	      sin = new float[SIN_COUNT];
	      cos = new float[SIN_COUNT];

	      for (int i = 0; i < SIN_COUNT; i++)
	      {
	         sin[i] = (float) Math.sin((i + 0.5f) / SIN_COUNT * radFull);
	         cos[i] = (float) Math.cos((i + 0.5f) / SIN_COUNT * radFull);
	      }

	      // Four cardinal directions (credits: Nate)
	      for (int i = 0; i < 360; i += 90)
	      {
	         sin[(int)(i * degToIndex) & SIN_MASK] = (float)Math.sin(i * Math.PI / 180.0);
	         cos[(int)(i * degToIndex) & SIN_MASK] = (float)Math.cos(i * Math.PI / 180.0);
	      }
	   }
}

package fastSineTest;

public class PFSine {
	
	static final float[] data = makeSineWaveTable();
	
	public static float get(int i)
	{
		return data[i];
	}
	
	public static float get(float i)
	{
		final int idx = (int)i;
		
		return data[idx+1] * (i - idx) 
				+ data[idx] * ((idx+1) - i);
	}
	
	// mask used for looping through table (assuming 1024)
	public static int getMask()
	{
		return 0x3ff;
	}
	
	public static int getOperationalSize()
	{
		/*
		 * Table has one redundant record to allow 
		 * simpler LERP coding. For example, 1024 records
		 * to capture a single SINE cycle, but with a 
		 * 1025th record added that duplicates the first.
		 *     data[0] == data[1024] is TRUE
		 * So, since the get(float i) does a lookup of
		 * (i + 1), the last allowable position is 
		 *     i = data.length - 1; 
		 * Example: incoming < 1024.0 is OK for a table
		 * of length 1025 (1024 for cycle + 1 redundant).
		 */
		return (data.length - 1);
	}

	private static final int WAVE_TABLE_SIZE = 1025;
	
	public static float[] makeSineWaveTable()
	{	
		float[] audioData = new float[WAVE_TABLE_SIZE];
		
		for (int i = 0; i < WAVE_TABLE_SIZE; i++)
		{
			audioData[i] = (float)(Math.sin(2 * i *
					(Math.PI/(WAVE_TABLE_SIZE - 1))));
		}
		
		return audioData;
	}
}

Updated to include Icecore’s version

Isn’t that being managed inside the code, which would just mean that he’s managing it “better”? :wink:

I doubt it. At some point your cursor (oscillator phase?) will surely become of sufficient magnitude that the phase increment effectively becomes a no-op? Even before it becomes a no-op, it will become less accurate. I wonder how long it takes for that to become audible?

[OT] would love to see the code for the algorithm if you’re up for sharing it (in another thread)? Trying to get my head around FM (well, I assume you’re actually doing PM?) at the moment.

[quote=“philfrei,post:11,topic:55153”]
I actually don’t use any interpolation. The LUT is really big - I just pick the nearest value.

I just mask any higher bits off. Think of it as: icode(input * constant) % lut.length[/icode] but because of the ‘carefully’ chosen constant, lut.length is a power of two, which means I can do: icode(input * constant) & (lut.length - 1)[/icode]. By masking off higher bits I also don’t have to check the lower bound, allowing a branch-free calculation.

Because I’m always suspicious of benchmarks written by gasp others… I rolled my own. There was just no way that my LUT code was ‘about as fast’ as some of the competition :point:

Lo and behold, the results are completely different :persecutioncomplex:

Executable benchmark code:
[x] http://pastebin.java-gaming.org/397b6012e341e

I test the performance of:
progressive sequence in the range -0.5pi … +0.5pi
progressive sequence in the range -8.0pi … +8.0pi
random values in a float[16k] in the range -0.5pi … +0.5pi
random values in a float[16k] in the range -8.0pi … +8.0pi

N.B.: All float[]s (including the LUTs) reside in L1 cache.


RUN 5, Java version: 1.8.0_45 (25.45-b02)

linear progression -0.5PI..+0.5PI:
	 java.math   59.895 ns/op
	 devmaster   58.804 ns/op
	 icecore     35.358 ns/op
	 riven        3.917 ns/op
	 kappa        3.932 ns/op

linear progression -8PI..+8PI:
	 java.math   82.075 ns/op
	 devmaster   61.671 ns/op
	 icecore     37.432 ns/op
	 riven        3.909 ns/op
	 kappa        4.152 ns/op // breaks beyond -2PI..+2PI

input float[]/L1 -0.5PI..+0.5PI:
	 java.math   59.413 ns/op
	 devmaster   58.514 ns/op
	 icecore     35.255 ns/op
	 riven        2.085 ns/op
	 kappa        2.704 ns/op

input float[]/L1 -8PI..+8PI:
	 java.math   89.183 ns/op
	 devmaster   63.673 ns/op
	 icecore     35.955 ns/op
	 riven        2.101 ns/op
	 kappa        4.669 ns/op // breaks beyond -2PI..+2PI

Linear progressions are calculated like this:


	private static float test***Linear(float min, float step, int count)
	{
		float sum = 0.0f;
		for(int i = 0; i < count; i++)
			sum += ***.sin(min + step * i);
		return sum;
	}

Reading float values as arguments:


	private static float test***Input(float[] values, int mask, int count)
	{
		float sum = 0.0f;
		for(int i = 0; i < count; i++)
			sum += ***.sin(values[i & mask]);
		return sum;
	}

Please point out all the glorious flaws in this micro-benchmark (and improve the code while you’re at it! :))


 -0.5PI .. +0.5PI	min err		max err		avg err		stddev
devmaster 				0.000000	0.001090	0.000502	0.000318
icecore   				0.000000	0.001255	0.000395	0.000319
riven     				0.000002	0.002295	0.000968	0.000566
kappa     				0.000000	0.056010	0.029677	0.019574

 -8.0PI .. +8.0PI	min err		max err		avg err		stddev
devmaster 				0.000000	0.001091	0.000504	0.000318
icecore   				0.000000	0.001255	0.000402	0.000322
riven     				0.000000	0.002298	0.000974	0.000565
kappa     				0.000000	------------- // beyond supported range

Not so fast as i hope)
(i thnk magic in % Remainder operation float)


linear progression -0.5PI..+0.5PI:
	 java.math   68.584 ns/op
	 devmaster   30.111 ns/op
	 icecore     29.352 ns/op
	 riven       13.923 ns/op
	 kappa       10.247 ns/op

linear progression -8PI..+8PI:
	 java.math   96.521 ns/op
	 devmaster   31.088 ns/op
	 icecore     29.686 ns/op
	 riven       10.896 ns/op
	 kappa       10.368 ns/op

input float[]/L1 -0.5PI..+0.5PI:
	 java.math   73.849 ns/op
	 devmaster   34.602 ns/op
	 icecore     20.253 ns/op
	 riven        3.180 ns/op
	 kappa       14.639 ns/op

Riven, your table has 8 4 times more values than mine. I’m wondering if that compensates sufficiently for the lack of linear interpolation. There is a way to calculate and compare the errors, I’m sure.

I should have noticed that all the operations were occurring within the square brackets!

I was going to use the bit masking idea for keeping the LUT cursor within bounds, but let it go when Java wouldn’t allow a byte to be anded with a float. I feel like a turtle among the gazelles. With some effort, I’ll look again to see if that is what you coded.

Adding lerping to Riven’s algorithm does not have any noticeable performance impact in my grass simulation, so I will roll with that. Adding lerping reduces the error to 1/200th of just reading a single value, allowing for a smaller lookup table more likely to be in the cache.


	public static final float sin(float rad) {
		float index = rad * radToIndex;
		
		float alpha = index - (int)index;
		
		int i = (int)(index) & SIN_MASK;
		
		float sin1 = sin[i+0];
		float sin2 = sin[i+1];
		
		return sin1 * (1f-alpha) + sin2*alpha;
	}

I also expanded the sin[] to hold an extra value when the last element of the array is indexed like philfrei did. IMO this is the best performance/quality tradeoff.

The LUT is 4096 elements, so 16KB (at most 5 pages).

I think memory optimisation is a bit premature :point: