Simplex noise, experiments towards procedural generation

philfrei · November 29, 2012, 9:22pm

@RobinB - I’ll try it. I wasn’t paying close enough attention to your suggestion, and mis-remembered .getDataElements() as your suggestion. Thanks for the reminder.

@Roquen - That would be interesting. I’d like to test it.
This source, correct?

And the usage is:


    noiseVal = Simplex3DNoise.eval(x, y, z);

…where each input is restricted to positive floats.

The code I’m using returns a double, so yours should provide a pickup on that basis alone! Not clear to me doubles are “cost-effective” for this use.

I’ve been doing this for testing: use a util.Timer() that is set to a small repeat-time increment, e.g., 15 msec, and let the various methods run. If the procedure takes longer than the gaps, then the actual refresh rate (which i will clock with nanoTime) will reflect this.

I’ll try to get to this before Saturday.

Roquen, did you get any further checking the suggestion from Spasi/StefanG on your thread? Maybe I can test that too.

Oops. Maybe not. I don’t have OpenGL running here, yet.

Defects at high frequencies? Well, that should be expected, yes? If the frequency gets larger than 1/2 the distance of the simplex size? (Not sure if the Nyquist concept holds here.) Also, don’t want to be using high frequencies that result in detail smaller than a pixel in size, anyway. I’m pretty sure when I allow scaling of 128 in SiVi that I’ve gone beyond pixel widths in detail, though I haven’t done the math to prove it.

Roquen · November 29, 2012, 10:10pm

Yeah, that’s it. You simply call eval as your example. Inputs only need to be restricted to positive if the compile time switch is flipped. The default is fine for negative input.

The high frequency defects only appear (that I ever noticed) once you’re zoomed out in the white-noise looking range and stem from using a cheap hashing function. The current version has an improved hash-function which removes the issue…so no it’s not related to the nyquist rate. This improved hash is “off” by default…I really should move the switch boolean simpleHash to the top of the code, it’s currently with the hash function. It possible that the cheap hash function does cause some lower level defects that I simply have never noticed and why a replacement version is in place.

philfrei · November 30, 2012, 10:51pm

@Roquen

Very simple test, I substituted SimpleNoise3D.eval() (yours) for SimpleNoise.noise() (Stefan Gustavson’s implementation), at the one spot where the noise function was being invoked. Nothing else was changed.

For the graphic size (1000 x 200 pixels), Stefan’s averaged 73 msec per frame, and yours averaged 97 msec per frame.

When I switch Stefan’s to 2D (simply by eliminating the 3rd parameter), the rate is 49 msec per frame.

util.Timer was sending commands to update the graphic every 15 msec, but the size pretty much guaranteed the process would take significantly longer.

In comparison, a 256 x 128 graphic can zip along and give a readout of about 16.75 msec for yours and 15.65 for Stefan’s 3D.

I haven’t tried RobinB’s suggestion yet. That’s next.

philfrei · December 1, 2012, 1:41am

Trying RobinB’s suggestion.

I’m substituting this line:
outData = ((DataBufferInt)image.getRaster().getDataBuffer()).getData();

in place of this line:
outData = (int[]) raster.getDataElements(0, 0, width, height, outData);

And the bonus for using the first is that I can eliminate this line from the end of the update loops:
raster.setDataElements(0, 0, width, height, outData);

Result with 1000 x 200 animation, 3D:
using get/setDataElements: frame averages about 74 msecs
using RobinB’s method: 71 or 72 msecs

There is some pickup in speed.

With 2D of the same size image:
get/setDataElements: 50 msec
RobinB’s: a little over 48 msec

With 256 * 128, 3D
get/setDataElements: 15.625
RobinB’s: 15.625

Roquen · December 1, 2012, 7:42am

Thanks for the timing. Mine is intended to be understandable rather than fast, but I’m still surprised at the speed difference.

philfrei · December 1, 2012, 8:21am

You are quite welcome!

Java continually surprises me, when it comes to trying to “optimize” code.

A couple more tests, for grins:

Baseline:

1000 x 200 pixels, 3D call, using RobinBs image handling method: 71 or 72 msec roughly

eliminating ONLY the Simplex call, just setting noiseVal to 0: 15.625 msec

eliminating all image mgmt and calculation with the noiseVal, leaving just the Simplex 3D (Gustavson) call: 69 msec

same with just a Simplex2D call (Gustavson): a touch over 46 msec

So it seems to me that if there is a place where further improvements should be sought, it would be in trying to make Gustavson’s or your implementation faster. Given that yours looked fine, using floats, I wonder if there would be a pickup converting Gustavson’s to use floats? Maybe it would be too little to matter. But who knows.

I guess it is good to have a target value and “business need” before putting too much time into optimization investigations.

Roquen · December 1, 2012, 8:25am

Really if you care about speed, then move to the GPU. I’d need to look at the code, but it’s highly unlikely that changing his code from doubles to floats will break anything. Moreover it’s highly unlikely that it’d have any impact on quality.

RobinB · December 1, 2012, 11:56am

I thought the int array had an bigger impact on speed, but its still a little better, so i was right :D.
I have try’d some noise functions on the gpu (with openCL), it was not really much faster then noise on the cpu.
Also i have tryd converting evrything to floats, it gave no noticable speedup.
Bitmasking and predefining variables of the noise function also didnt really work out.
I guess the best way is to run some worker threads on the background to preload the hightmaps at runtime.

philfrei · December 2, 2012, 8:04am

@RobinB – I think your method is a significant improvement. Consider, if you subtract the time spent entirely with the Simplex call itself (69) from using your method (71) or not using your method (74), you can see that your method is about twice as fast as the alternative. The only reason the pickup is so modest is that the Perlin call itself is such a bigger percentage of the total process.

@Roquen – I know nothing about how to shift programming to the gpu. I suppose I should Goggle it. If you have any suggestions for tutorials along these lines that you personally found more helpful than others, I’d love to hear about them!

RobinB · December 2, 2012, 9:01am

OpenCL is one solution, ill give the code how far i got.
Its saves a few ms (with me, it took like 300-400 ms instead of 500).
It just makes evrything much more complicated.

Baseclass:


    static{
        try {
            CL.create();
        } catch (LWJGLException ex) {
            ex.printStackTrace();
        }
    }
    
    protected CLContext context;
    protected CLCommandQueue queue;
    protected CLProgram program;
    protected CLKernel kernel;
    protected CLDevice divice;
    protected String name, data;
    
        
    public Computing_base(String name, String data) {
        this.name = name;
        this.data = data;
                 
        create();

        allocate();
        
        loadKernel();
    }
    
    protected final boolean create(){
        try {
            CLPlatform platform = CLPlatform.getPlatforms().get(0);
            List<CLDevice> devices = platform.getDevices(CL_DEVICE_TYPE_GPU);                        
            context = CLContext.create(platform, devices, null, null, null);
            
            divice = devices.get(0);
            queue = clCreateCommandQueue(context, divice, CL_QUEUE_PROFILING_ENABLE, null);
            
        } catch (LWJGLException ex) {
            ex.printStackTrace();
            return false;
        }
        
        return true;
    }
    
    protected final void loadKernel(){
        // program/kernel creation
        program = clCreateProgramWithSource(context, data, null);
        Util.checkCLError(clBuildProgram(program, divice, "", null));
        
        // sum has to match a kernel method name in the OpenCL source
        kernel = clCreateKernel(program, name, null);
    }
    
    public void Dispose(){
        clReleaseKernel(kernel);
        clReleaseProgram(program);
        clReleaseCommandQueue(queue);
        clReleaseContext(context);
    }

    
    protected abstract void allocate();
    public abstract void excecute();
    
    
    
    protected static FloatBuffer toFloatBuffer(float[] floats) {
        FloatBuffer buf = BufferUtils.createFloatBuffer(floats.length).put(floats);
        buf.rewind();
        return buf;
    }

    protected static void print(FloatBuffer buffer) {
        for (int i = 0; i < buffer.capacity(); i++) {
            System.out.print(buffer.get(i)+" ");
        }
        System.out.println("");
    }

My noise extend:


public class Computing_noise extends Computing_base {

    private FloatBuffer a, b, answer;
    
    private CLMem aMem, bMem, answerMem;
    
    public Computing_noise(){
        super("perlin3d", FileManager.readFileAsString("D://noise2.txt"));
    }
    
    @Override
    protected void allocate() {

    }
    
    public void setDimensions(int z, float step, int w, int h, int d){
        a = toFloatBuffer(new float[]{ z, step, w, h, d });
        aMem = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, a, null);
        clEnqueueWriteBuffer(queue, aMem, 1, 0, a, null, null);

        /* Prepare gradients. */
        int gsize = (w + 1) * (h + 1) * (d + 1);
        int size = (int) (w * h);
        Random rng = new Random();
        
        float[] vectors = new float[3 * gsize];
        for (int i = 0; i < vectors.length; i++) {
            vectors[i] = rng.nextFloat() * 2f - 1f;
        }
        b = toFloatBuffer(vectors);
        answer = BufferUtils.createFloatBuffer(size);

        /* Allocate memory */
        bMem = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, b, null);
        clEnqueueWriteBuffer(queue, bMem, 1, 0, b, null, null);
        answerMem = clCreateBuffer(context, CL_MEM_WRITE_ONLY | CL_MEM_COPY_HOST_PTR, answer, null);
        clFinish(queue);
    }

    @Override
    public void excecute() {
        // execution
        PointerBuffer kernel1DGlobalWorkSize = BufferUtils.createPointerBuffer(1);
        kernel1DGlobalWorkSize.put(0, answer.capacity());
        kernel.setArg(0, bMem);
        kernel.setArg(1, aMem);
        kernel.setArg(2, answerMem);
        clEnqueueNDRangeKernel(queue, kernel, 1, null, kernel1DGlobalWorkSize, null, null, null);

        // read the results back
        clEnqueueReadBuffer(queue, answerMem, 1, 0, answer, null, null);
        clFinish(queue);

//        print(a);
//        System.out.println("+");
//        print(b);
//        System.out.println("=");
        //print(answer);
    }
    
    public float[] getResult(){
        float[] s = new float[answer.capacity()];
        answer.flip();
        answer.get(s);
        return s;
    }  
}

Noise code (noise2.txt):


/* Returns a pointer to the gradient vector for the given grid point. */
global const float *get_gradient(global const float *gradients,
                                 const float w, const float h, const float *c)
{
    int base = (c[0] + c[1] * w + c[2] * w * h) * 3;
    return gradients + base;
}

float calc_magnitude(global const float *g, const float *c, const float *p)
{
    return g[0] * (p[0] - c[0]) + g[1] * (p[1] - c[1]) + g[2] * (p[2] - c[2]);
}

float weight(const float *c, const float *p)
{
    float t0 = 1 - fabs(c[0] - p[0]);
    float t1 = 1 - fabs(c[1] - p[1]);
    float t2 = 1 - fabs(c[2] - p[2]);
    return (3 * pown(t0, 2) - 2 * pown(t0, 3))
         * (3 * pown(t1, 2) - 2 * pown(t1, 3))
         * (3 * pown(t2, 2) - 2 * pown(t2, 3));
}

kernel void
perlin3d(global const float *gradients,
         global const float *params,
         global float *value)
{
    /* Fetch and calculate parameters. */
    unsigned int id = get_global_id(0);
    float w = params[2]; // area width  (x)
    float h = params[3]; // area height (y)
    float d = params[4]; // area depth  (z)

    float x = fmod(id, w);       // x-position to sample
    float y = floor(id / w);      // y-position to sample    
    float z = d;                    // z-position to sample
    const float p[] = {x, y, z};

    /* Calculate grid corners. */
    const float c000[] = {floor(x), floor(y), floor(z)};
    const float c001[] = {c000[0] + 0, c000[1] + 0, c000[2] + 1};
    const float c010[] = {c000[0] + 0, c000[1] + 1, c000[2] + 0};
    const float c011[] = {c000[0] + 0, c000[1] + 1, c000[2] + 1};
    const float c100[] = {c000[0] + 1, c000[1] + 0, c000[2] + 0};
    const float c101[] = {c000[0] + 1, c000[1] + 0, c000[2] + 1};
    const float c110[] = {c000[0] + 1, c000[1] + 1, c000[2] + 0};
    const float c111[] = {c000[0] + 1, c000[1] + 1, c000[2] + 1};

    /* Find each of the grid gradients. */
    global const float *g000 = get_gradient(gradients, w, h, c000);
    global const float *g001 = get_gradient(gradients, w, h, c001);
    global const float *g010 = get_gradient(gradients, w, h, c010);
    global const float *g011 = get_gradient(gradients, w, h, c011);
    global const float *g100 = get_gradient(gradients, w, h, c100);
    global const float *g101 = get_gradient(gradients, w, h, c101);
    global const float *g110 = get_gradient(gradients, w, h, c110);
    global const float *g111 = get_gradient(gradients, w, h, c111);

    /* Dot products. */
    float m000 = calc_magnitude(g000, c000, p);
    float m001 = calc_magnitude(g001, c001, p);
    float m010 = calc_magnitude(g010, c010, p);
    float m011 = calc_magnitude(g011, c011, p);
    float m100 = calc_magnitude(g100, c100, p);
    float m101 = calc_magnitude(g101, c101, p);
    float m110 = calc_magnitude(g110, c110, p);
    float m111 = calc_magnitude(g111, c111, p);

    /* Weights. */
    float w000 = weight(c000, p);
    float w001 = weight(c001, p);
    float w010 = weight(c010, p);
    float w011 = weight(c011, p);
    float w100 = weight(c100, p);
    float w101 = weight(c101, p);
    float w110 = weight(c110, p);
    float w111 = weight(c111, p);

    value[id] =
          w000 * m000
        + w001 * m001
        + w010 * m010
        + w011 * m011
        + w100 * m100
        + w101 * m101
        + w110 * m110
        + w111 * m111;
}

Roquen · December 2, 2012, 9:32am

I have a half completed demo applet for the wiki page. I’ll pastebin it when I get a chance.

philfrei · December 2, 2012, 7:30pm

A thought I just had: perhaps for animation, one could do something along the lines of only referencing the Perlin space something like once every 8 frames, and lerp between them for the intermediate frames.

Am a little intimidated by RobinB’s code. Probably it will make more sense if I lookup OpenCL and find out what that is…

Roquen · December 2, 2012, 7:56pm

If noise is being used for visuals…I can’t think of a good reason to use OpenCL.

davedes · December 2, 2012, 8:25pm

Personally I’d say GLSL is the obvious solution…
http://www.geeks3d.com/20110317/shader-library-simplex-noise-glsl-opengl/

http://glsl.heroku.com/e#1450.0
http://glsl.heroku.com/e#1000.2
http://www.kamend.com/2012/06/perlin-noise-and-glsl/

RobinB · December 2, 2012, 9:46pm

Well, its a little faster, so i suppose it has little use.
I dont get how noise with glsl can be fast,
I mean each frame evrything needs to get recomputed, how can this be fast?
Its not an argument, just an request for info

Best_Username_Ever · December 2, 2012, 11:25pm

Render to a texture once. Save the results. Draw the texture over and over again. No need to recompute anything.

davedes · December 2, 2012, 11:36pm

Generally speaking the GPU will vastly outperform the CPU in calculation speed. As mentioned, you don’t need to recompute every frame (unless that’s desired, i.e. animated noise). You can also use textures to store gradient tables or other data, instead of recomputing it unnecessarily. They have the added bonus of things like wrap modes (GL_REPEAT outside of index bounds) and bilinear sampling.

Also, regarding the actual graphics pipeline… If you create an array of RGB pixels on the CPU, it needs to be then uploaded to a texture on the GPU. Since texture transfer is generally a bit slow, it makes more sense to keep it all the GPU.

Roquen · December 3, 2012, 6:47am

Rendering to texture is fine if a texture is a good enough…sadly it frequently isn’t. There are multiple issues here. One is a texture will get filtered which is of no use in some cases where sampling noise will get you the accurate value regardless of where the object reside in the scene. Another issue is the domain of the sampled surface…such as accurate fragment coverage of the surface of a sphere (see HEALPix).

Best_Username_Ever · December 4, 2012, 2:55am

Yes, but whether you need to redraw something or can reuse precalculated noise, it would be the same for OpenGL and OpenCL.

Roquen · December 4, 2012, 6:46am

OpenCL = general purpose, OpenGL = specific purpose. So, it depends on how you define ‘same’.