Bound's checks and struct's

princec · October 22, 2005, 2:45pm

I especially do not want reference types included in MappedObjects; it is an absolute can of worms. The whole idea of a MappedObject is to manipulate primitive data in a native byte buffer. By all means construct Color3fs from it but don’t confuse the simplicity of the design with trying to automatically populate references.

No significant language spec change is required by the way; can anyone see anywhere where the language spec needs to be changed? There may be a slight addition to mandate how the VM would classload such an object but at the end of the day it can all be implemented by trivial bytecode rewriting at classload time (eg. replace all read references to field X with getX() method call and synthetically create getX() method to access bytebuffer using ordinary means) although that would mean there wouldn’t be any performance benefits as the bounds checks would likely remain.

The best bit about the design is that it looks like all other Java code and doesn’t behave in any way differently.

The one thing you might possibly consider is adding an annotation to the primitive fields you want to map eg. @mapped but that is probably just adding confusion to a very simple concept. The less there is to understand the less will go wrong and the less it will be abused.

Cas

abies · October 22, 2005, 3:45pm

Not a reference types, inlined structures (so you can reuse .normalize method from Vector3f struct on 3 x,y,z values which happen to be inside bigger structure). But as I have said, it is totally optional, basic idea is about having explicit fields listed and have bytecode rewriter doing all the magic behind the scenes on class load time.

AndersDahlberg · October 22, 2005, 3:54pm

hear, hear

Riven · October 23, 2005, 12:27am

I did this calculation for every element:

c.x = a.x * (1.0-weight) + b.x * weight
c.y = a.y * (1.0-weight) + b.y * weight
c.z = a.z * (1.0-weight) + b.z * weight

a, b and c where in different data-structures:


FloatBuffer bbA, bbB, bbC;
Vector3f[] vcA, vcB, vcC;

The results:

Running benchmark with 128 3d vecs...
math on Vec3[]:           3.9ms      32200 / sec <---
math on FloatBuffer:     20.4ms       6200 / sec
math on unsafe buffer:    3.3ms      38600 / sec <---
math on unsafe struct:    9.1ms      14000 / sec

Running benchmark with 256 3d vecs...
math on Vec3[]:           8.2ms      30900 / sec <---
math on FloatBuffer:     40.7ms       6200 / sec
math on unsafe buffer:    7.9ms      32200 / sec <---
math on unsafe struct:   16.2ms      15700 / sec

Running benchmark with 512 3d vecs...
math on Vec3[]:          16.5ms      31000 / sec <---
math on FloatBuffer:     74.9ms       6800 / sec
math on unsafe buffer:   14.4ms      35400 / sec <---
math on unsafe struct:   28.0ms      18200 / sec

Running benchmark with 1024 3d vecs...
math on Vec3[]:          31.5ms      32400 / sec <---
math on FloatBuffer:    150.2ms       6800 / sec
math on unsafe buffer:   29.3ms      34900 / sec <---
math on unsafe struct:   54.6ms      18700 / sec

Running benchmark with 2048 3d vecs...
math on Vec3[]:          66.4ms      30800 / sec <---
math on FloatBuffer:    299.4ms       6800 / sec
math on unsafe buffer:   58.9ms      34700 / sec <---
math on unsafe struct:  107.0ms      19100 / sec

Running benchmark with 4096 3d vecs...
math on Vec3[]:         144.1ms      28400 / sec
math on FloatBuffer:    593.7ms       6800 / sec
math on unsafe buffer:  127.3ms      32100 / sec <---
math on unsafe struct:  215.9ms      18900 / sec

Running benchmark with 8192 3d vecs...
math on Vec3[]:        1551.9ms       5200 / sec
math on FloatBuffer:   1212.3ms       6700 / sec
math on unsafe buffer:  276.3ms      29600 / sec <---
math on unsafe struct:  467.8ms      17500 / sec

Running benchmark with 16384 3d vecs...
math on Vec3[]:        3480.1ms       4700 / sec
math on FloatBuffer:   2666.8ms       6100 / sec
math on unsafe buffer:  960.2ms      17000 / sec
math on unsafe struct: 1193.1ms      13700 / sec

* Riven mumbles something about cache misses…

The unsafe struct was implemented as one object used as ‘sliding window’ struct backed by an unsafe buffer.

All the results were averaged over 8 runs, after warming up 8 runs.

Raw performance is only achievable on native buffers if do your own pointer-arithmetic. FloatBuffers are a no-go for performance, not for direct calls, and not for backing a struct. Unsafe ‘sliding window’ structs cut performance roughly in half, but that could be acceptable as it is less error-phrone and just convienient.

If mapped objects would one day be implemented in java, with a ByteBuffer acting as a heap, there would be no need for bytecode weaving and we’d have the same performance as class-fields.

Raghar · October 23, 2005, 4:01pm

Could you make source code ready for download?

Riven · October 23, 2005, 5:09pm

sourcecode

Run it with:

         Thread.currentThread().setPriority(Thread.MAX_PRIORITY - 1);

         for (int i = 0; i < 4; i++)
         {
            benchmark(1024 / 8);
            benchmark(1024 / 4);
            benchmark(1024 / 2);
            benchmark(1024 * 1);
            benchmark(1024 * 2);
            benchmark(1024 * 4);
            benchmark(1024 * 8);
            benchmark(1024 * 16);
         }

princec · October 23, 2005, 7:50pm

It sounds beguilingly simple except for the fact that it would be heinously complex to implement and difficult to understand. The Vector3fs, when are they constructed? What about arrays of Vector3fs? What about the array itself? How does that fit with the fact that it’s completely different from the way everything else works in Java? What about the fact that the ByteBuffer may be garbage collected but arrays inside them, which are fields, do not prevent the ByteBuffer from being collected as they do not hold a reference to their holding mapped object? etc.

Just Keep It Simple. One simple rule: primitive types only. Reference type fields left untouched.

Cas

abies · October 23, 2005, 10:12pm

Ok, let’s forget about inlining non-primitive types.

Do you think that bytecode weaving version for primitive-only structures is worth creating ? It is certainly a more complicated to implement that source code generator and only thing which it is really needed for is faking direct field access instead of using getters/setters. Will it be worth anything except as proof of concept ?

princec · October 24, 2005, 9:11am

I think that the bytecode weaving version should be created anyway for lightweight JVMs and debugging purposes. But really the whole point is that a suitably trick JVM will be able to spot the special case of extending MappedObject and write different compiled code.

Cas

mthornton · October 24, 2005, 12:29pm

You’re not the only one. I think someone is watching me too. At least the editor of the java.net front page seems to keep an eye on my postings.

sheet · November 3, 2005, 12:54pm

Most of a discussion was focused on the performance comparison between math on POJO (plain old java object) and Unsafe buffers.

I have no experience at all with OpenGl, but i would like to enlarge the testcase here to have a better understanding of the total benefit of using unsafe buffers vs POJO, with using java.nio to push vertex to the graphics card (like lwjgl with glVertexPointer)

What’s the difference between those process :
0) Math on an array of POJO Vector3D + call to individual glVertex(…)

Math on an array of POJO Vector3D + filling a FloatBuffer + one call to glVertexPointer
Math on an unsafe buffer of float (manipulated through a Vector3DSlidingWindow) + one call to glVertexPointer

Based on the results exposed by Riven, I have the feeling that Method 2) should be much more efficient because the unsafe buffer is manipulated directly and there is only one call to opengl to pass the array…

Is it right?