I personally would not make my OpenCL bindings decision based on this, but I think we all know this is not about OpenCL, rather OpenGL. OpenCL should be discussed in it’s thread. It is just a proxy here. Michael, sorry if I am forcing your hand too early or messing up your schedule. Just say if you wish to postpone. As long as you run both sides yourself on the same hardware, you do not have to worry about distribution before you had planned.
The JNA side is using OpenCL4Java. This is what JavaCL is built on. We have 2 names so we knew what we were referring to when talking. I am probably the only person who uses OpenCL4Java directly.
I ran this on a Snow Leopard Macbook wt 2 GPU’s & Intel® Core™2 Duo CPU P8800 @ 2.66GHz. I tried to put out some rules, listed right in the source, but they can be improved do so. I am not the referee, my schedule is pretty tight.
When I did a run of 1000 loops (each loop is 5 calls) , I got an Avg msec per loop: 0.19629501. When I did 1 million, I got 0.018008577. The million is probably better, if you are doing 1000 OpenGL calls just for 1 frame, but play with it.
`package whatever;
import java.nio.;
import com.sun.jna.;
import com.sun.jna.ptr.;
import com.ochafik.lang.jnaerator.runtime.NativeSize;
import com.ochafik.lang.jnaerator.runtime.NativeSizeByReference;
import com.nativelibs4java.opencl.library.;
/**
- JNA version, using OpenCL4Java(the low level bindings for JavaCL). Add this jar to project to run.
- http://nativelibs4java.sourceforge.net/maven/com/nativelibs4java/opencl4java/1.0-SNAPSHOT/opencl4java-1.0-SNAPSHOT-shaded.jar
- Goal: test call overhead of JNI vs JNA. OpenCL has dev info calls which are
- short in duration. They DO NOT touch GPU’s. The type of data returned can be found
- by running http://nativelibs4java.sourceforge.net/webstart/OpenCL/HardwareReport.jnlp
- Not every possible query performed, only one of each return type. Too much work for
- all. Control using LOOP_COUNT.
- Turn on the clock only after Platform, dev created.
- Rules:
-
- platform must be NVidia 195 or 196 if Windows. Win7 64-bit if possible.
-
- Do not even bothering to create a context or command queue.
-
- The avg time/loop should be compared on exact same hardware. The value itself is
-
NOT important, only the difference in values.
-
- MUST "look" at value, since this could be different.
-
- MUST include any methods which one would reasonable need to do inside the
-
loop. e.g. getPointer() methods for JNA
-
- assigning return code required, but can be actual checking can be commented out
*/
public class JNIvsJNAviaOpenCL{
static int LOOP_COUNT = 1000000; // 1M
static float NANOS_PER_MILLI = 1000000F;
public static void main(String[] argv){
// get platform, usually only one, unless mixing NVidia & ATI GPU's
OpenCLLibrary.cl_platform_id[] platformArray = new OpenCLLibrary.cl_platform_id[1];
int err = OpenCLLibrary.INSTANCE.clGetPlatformIDs(1, platformArray, null);
if (err != OpenCLLibrary.CL_SUCCESS)
throw new RuntimeException("failed to get platform " + err);
// get any device, the device itself not important, but need to do queries against something
OpenCLLibrary.cl_device_id[] deviceArray = new OpenCLLibrary.cl_device_id[1];
err = OpenCLLibrary.INSTANCE.clGetDeviceIDs(platformArray[0], OpenCLLibrary.CL_DEVICE_TYPE_ALL, 1, deviceArray, null);
if (err != OpenCLLibrary.CL_SUCCESS)
throw new RuntimeException("failed to get device " + err);
// assorted vars declared out side the loop
OpenCLLibrary.cl_device_id dev = deviceArray[0]; // do not want to index dev array every call
long cummTime = 0L;
long start;
NativeSize szInt = new NativeSize(Native.LONG_SIZE);
IntByReference valInt = new IntByReference();
int lookedAtInt;
NativeSize szLong = new NativeSize(8);
LongByReference valLong = new LongByReference();
long lookedAtLong;
NativeSize szSizeT = new NativeSize(8);
NativeSizeByReference valSizeT = new NativeSizeByReference();
long lookedAtSizeT;
NativeSize szString = new NativeSize();
NativeSizeByReference nCharBuf = new NativeSizeByReference();
ByteBuffer valStringBuf;
int length;
String lookedAtString;
long force_JVM_to_do = 0;
for(int i = 0; i < LOOP_COUNT; i++){
start = System.nanoTime();
// int based info queries
err = OpenCLLibrary.INSTANCE.clGetDeviceInfo(dev, OpenCLLibrary.CL_DEVICE_VENDOR_ID, szInt, valInt.getPointer(), null);
// if (err != OpenCLLibrary.CL_SUCCESS)
// throw new RuntimeException("failed int query " + err);
lookedAtInt = valInt.getValue();
// long based info queuies
OpenCLLibrary.INSTANCE.clGetDeviceInfo(dev, OpenCLLibrary.CL_DEVICE_MAX_MEM_ALLOC_SIZE, szLong, valLong.getPointer(), null);
// if (err != OpenCLLibrary.CL_SUCCESS)
// throw new RuntimeException("failed long query " + err);
lookedAtLong = valLong.getValue();
// tSize based info queuies
OpenCLLibrary.INSTANCE.clGetDeviceInfo(dev, OpenCLLibrary.CL_DEVICE_IMAGE2D_MAX_WIDTH, szSizeT, valSizeT.getPointer(), null);
// if (err != OpenCLLibrary.CL_SUCCESS)
// throw new RuntimeException("failed tsize query " + err);
lookedAtSizeT = valSizeT.getValue().longValue();
// string based info queuies (2 calls, first to find out size; 2nd to get)
err = OpenCLLibrary.INSTANCE.clGetDeviceInfo(dev, OpenCLLibrary.CL_DRIVER_VERSION, szString, null, nCharBuf);
// if (err != OpenCLLibrary.CL_SUCCESS)
// throw new RuntimeException(ErrorDesc.getErrorDesc(err));
length = nCharBuf.getValue().intValue();
szString.setValue(length);
valStringBuf = NIO_Utils.getByteBuffer(length);
// call again to get the actual value
err = OpenCLLibrary.INSTANCE.clGetDeviceInfo(dev, OpenCLLibrary.CL_DRIVER_VERSION, szString, Native.getDirectBufferPointer(valStringBuf), null);
// if (err != OpenCLLibrary.CL_SUCCESS)
// throw new RuntimeException("failed string query " + err);
// else
lookedAtString = NIO_Utils.toString(valStringBuf);
cummTime += System.nanoTime() - start;
force_JVM_to_do += lookedAtInt - lookedAtLong + lookedAtSizeT - lookedAtString.length();
}
System.out.println("Avg ms per loop: " + (cummTime/(LOOP_COUNT * NANOS_PER_MILLI)));
System.out.println("ignore: " + force_JVM_to_do);
}
}
`