Revisiting CLKernel.setArg

Hello!

First of all I want to express my gratitude for this project. It’s gotten me into java and it’s very easy to use. Thanks for that!

Now, originally I started using JOCL. With JOCL the setarg just takes buffers directly. In fact the downside there was that it didn’t have any overloads for scalar types.

I’ve now converted to LWJGL.

So the compromise now on this side seems to be the lack of ability to have inputs that are buffers (e.g. for a Float4 constant parameter). Now I’m fairly new to OpenCL and I’m fairly certain I’m not doing things optimally yet, however, I have hit the limit of 8 constant kernel arguments and I would like to circumvent that simply by stuffing them into groupings in 4-wide floats/ints.

Can this be rectified? Is there some other more proper way to do this presently? I don’t think I’ve seen an example that uses this yet, and of course these types of parameters compile just fine.

Edit: for clarity

Hey Subterfuge,

The scalar setArg methods in CLKernel are just helper functions on top of the low-level CL binding, for convenience. In fact, there is no scalar version of clSetKernelArg in the OpenCL API, there’s only a function that takes an arbitrary void pointer. For buffer arguments, have a look at the CL10 class. There are overloaded clSetKernelArg methods in there that accept NIO buffers.

Makes sense, thanks for your response on this Spasi. Now I’m having trouble implementing it. Getting INVALID_ARG_SIZE.


FloatBuffer ScaleBuffer = BufferUtils.createFloatBuffer(3);
    	ScaleBuffer.put(5.0f).put(0.05f).put(0.25f).flip();//PostScale, NoiseScale, ColorScale
    	
        // Set the constant kernel arguments
    	int argnum = 0;
    	clDevice.GetKernel(KERNEL_NAME).setArg(argnum++, clDevice.GetBuffer("Input"));
    	clDevice.GetKernel(KERNEL_NAME).setArg(argnum++, clDevice.GetBuffer("Output"));
    	clDevice.GetKernel(KERNEL_NAME).setArg(argnum++, clDevice.GetBuffer("ColorOutput"));
    	CL10.clSetKernelArg(clDevice.GetKernel(KERNEL_NAME), argnum++, ScaleBuffer);
    	//clDevice.GetKernel(KERNEL_NAME).setArg(argnum++, 5.0f);
    	clDevice.GetKernel(KERNEL_NAME).setArg(argnum++, seed);
    	clDevice.GetKernel(KERNEL_NAME).setArg(argnum++, frequency);
    	clDevice.GetKernel(KERNEL_NAME).setArg(argnum++, lacunarity);
    	clDevice.GetKernel(KERNEL_NAME).setArg(argnum++, offset);
    	clDevice.GetKernel(KERNEL_NAME).setArg(argnum++, gain);
    	clDevice.GetKernel(KERNEL_NAME).setArg(argnum++, octaveCount);
    	clDevice.GetKernel(KERNEL_NAME).setArg(argnum++, noiseQuality);
    	clDevice.GetKernel(KERNEL_NAME).setArg(argnum++, NoiseGen.X_NOISE_GEN);
    	clDevice.GetKernel(KERNEL_NAME).setArg(argnum++, NoiseGen.Y_NOISE_GEN);
    	clDevice.GetKernel(KERNEL_NAME).setArg(argnum++, NoiseGen.Z_NOISE_GEN);
    	clDevice.GetKernel(KERNEL_NAME).setArg(argnum++, NoiseGen.SEED_NOISE_GEN);
    	clDevice.GetKernel(KERNEL_NAME).setArg(argnum++, SHIFT_NOISE_GEN);
    	clDevice.GetKernel(KERNEL_NAME).setArg(argnum,  clDevice.GetBuffer("randomVectors"));	

Fatfingered the numpad enter lol…


__kernel void Scale(global float4* input, global float4* output, global float4* colors, float3 scale,  int seed, float frequency, float lacunarity, float offset, float gain, int octaveCount, int noiseQuality,  int NOISE_GEN_X,  int NOISE_GEN_Y,  int NOISE_GEN_Z, int NOISE_GEN_SEED, int SHIFT_NOISE_GEN, global float4* randomVectors)

At any rate, from what I know about OpenCL, I’m probably supposed to be making structures that hit the local memory cache for these constant parameters. In my final implementation the parameter list will be too long anyway, it already is! Strange things are happening like using local variables doesn’t work but adding a parameter resolves it, adding 2 parameters doesn’t? It sounds like the local cache is already full really…

My parameter list is going to grow indefinitely because the idea is to batch all the calls from the map of noise modules into 1 VLIW. That means in the end I just need buffers anyway and some intelligent system to manage them on the host side.

Alright well I came back to this and here’s what I gathered.

If I use float3 it will throw CL_INVALID_ARG_SIZE.

If I use float3 and make the FloatBuffer size(4) filling with 1.0f in the w. GPU Crash.

If I use float3 and make the FloatBuffer size(4) and don’t fill the w. CL_INVALID_ARG_SIZE.

If I use float4 and make the FloatBuffer size(4) and fill the w with 1.0f. Works fine.

So I’m not sure what I’m doing wrong… I’m even attempting to use clSetKernelArgSize.


clDevice.noiseParams = new NoiseParams(NoiseQuality.QUALITY_BEST);
scaleParams = new ScaleParams(5.0f, 0.05f);
ridgedMultiParams = new RidgedMultiParams(1.0f, 2.0f, 1.0f, 2.0f, 6, 234);  	

int argnum = 0;    	
clDevice.GetKernel(KERNEL_NAME).setArg(argnum++, clDevice.GetBuffer("Input"));
clDevice.GetKernel(KERNEL_NAME).setArg(argnum++, clDevice.GetBuffer("Output"));
clDevice.GetKernel(KERNEL_NAME).setArg(argnum++, clDevice.GetBuffer("ColorOutput"));
CL10.clSetKernelArg(clDevice.GetKernel(KERNEL_NAME), argnum, clDevice.noiseParams.Size);<-- Error Here (CL_INVALID_ARG_VALUE)
CL10.clSetKernelArg(clDevice.GetKernel(KERNEL_NAME), argnum++, clDevice.noiseParams.Buffer);<-- Error here if above line is commented (CL_INVALID_ARG_SIZE)
CL10.clSetKernelArg(clDevice.GetKernel(KERNEL_NAME), argnum, scaleParams.Size);
CL10.clSetKernelArg(clDevice.GetKernel(KERNEL_NAME), argnum++, scaleParams.Buffer);
CL10.clSetKernelArg(clDevice.GetKernel(KERNEL_NAME), argnum, ridgedMultiParams.Size);
CL10.clSetKernelArg(clDevice.GetKernel(KERNEL_NAME), argnum++, ridgedMultiParams.Buffer);
clDevice.GetKernel(KERNEL_NAME).setArg(argnum,  clDevice.GetBuffer("randomVectors"));


package libnoiseforjava.opencl;

import java.nio.ByteBuffer;

import libnoiseforjava.Interp;
import libnoiseforjava.NoiseGen;
import libnoiseforjava.NoiseGen.NoiseQuality;

import org.lwjgl.BufferUtils;

public class NoiseParams {
	public final long Size = clSize.clInt4 + clSize.clInt + clSize.clInt;
	public final ByteBuffer Buffer = BufferUtils.createByteBuffer((int)Size);
	
	public NoiseParams(NoiseQuality noiseQuality)
	{		
		Buffer.putInt(NoiseGen.X_NOISE_GEN).putInt(NoiseGen.Y_NOISE_GEN).putInt(NoiseGen.Z_NOISE_GEN).putInt(NoiseGen.SEED_NOISE_GEN);
		Buffer.putInt(NoiseGen.SHIFT_NOISE_GEN);
		
		switch (noiseQuality)
		{
		     case QUALITY_FAST:
		    	 Buffer.putInt(0);
		    	 break;
		     case QUALITY_STD:
		    	 Buffer.putInt(1);
		    	 break;
		     case QUALITY_BEST:		
		    	 Buffer.putInt(2);
		    	 break;
		}
		Buffer.flip();
	}
}


typedef struct
{	
	float4 vScale;
	float nScale;
	//float cScale;
} ScaleParams;

typedef struct
{
	int4 NOISE_GEN;
	int SHIFT_NOISE;
	int NOISE_QUALITY;
} NoiseParams;

typedef struct
{
	float frequency;
	float lacunarity;
	float persistance;
	int octaveCount;
	int seed;
} PerlinParams;

typedef struct
{
	float frequency;
	float lacunarity;
	float offset;
	float gain;
	int octaveCount;
	int seed;
} RidgedMultiParams;

__kernel void Scale(__constant float4* input, __global float4* output, __global float4* colors, const NoiseParams noiseParams, const ScaleParams scale, const RidgedMultiParams rmParams, __constant float4* randomVectors)
...

Edited a couple times for clarity.

From the OpenCL spec:

[quote]If the argument is declared to be a pointer of a built-in scalar or vector type, or a user defined
structure type
in the global or constant address space, the memory object specified as argument
value must be a buffer object (or NULL).
[/quote]

Hi

[quote=“Subterfuge,post:1,topic:37649”]
If a feature is missing, you can still make a request for enhancement. Sorry for the off topic.

Ignore my previous reply, it’s wrong. You use const on the struct, not constant. What you’re doing should work and I’ve successfully tested it on my setup (AMD).

First of all, drop the line passing clDevice.noiseParams.Size, it’s not needed. You should only use that to specify the size of __local buffers. The next line that returns CL_INVALID_ARG_VALUE is correct, but it might be failing for one of two reasons:

  1. You’re hitting a struct alignment issue. The CL compiler might be inserting extra padding somewhere in the struct, so it expects a larger buffer. Try making NoiseParams’ Buffer bigger (e.g. 32 bytes instead of 24), use .clear() instead of .flip() and see what happens. You can also try specifying the packed attribute in your kernel, like so:
typedef struct __attribute__ ((packed))
{
   int4 NOISE_GEN;
   int SHIFT_NOISE;
   int NOISE_QUALITY;
} NoiseParams;
  1. The CL implementation is buggy. Are you testing on AMD, NV or Intel? I found this post from back in April:

[quote]As far as passing in a struct not as an array, I have tried many different combinations to no avail.

I looked at the OpenCL conformance tests and couldn’t find any place that tests that feature. It’s possible that it’s broken in some implementations :frowning:
[/quote]
Try passing it as a __constant pointer and see what happens. Basically do what I suggested in my previous reply.

Spasi, thanks once again for your help. It’s a treat! :slight_smile:

seems like the packed attribute resolves the sizing issue, however it may really be true that structs can only be __local or buffers. After running in circles and grasping at straws for a couple hours, I faced a situation that seemed to reflect uninitialized values in the structs in the kernel. So finally I threw up my hands and attempted to make a simple experiment that would prove they had to be buffers.

I made the scale kernel which presently was resulting in a green, small, sphere, scale the sphere using a buffer of ScaleParams. This worked and thusly I will finish implementing in this method (it was the way to go in the long run anyway when I will have multiple of each type of struct (besides noiseparms which are just some base noise constants).

Thanks again!