Objects vs mapped objects vs floats

bitshit · December 17, 2007, 11:14am

Been a while since I last did some serious Java coding, but I’m thinking of making a little graphics demo (demo as in demoscene).
As performance will be critical, I was wondering about how to store, access and manipulate my data as efficient as possible. I did a little search in this forum and found some posts on this, but they only confused me more:

In this post, its mentioned having objects for models/poly’s and vertices should be the fastest, as using arrays will cause a lot of exception checks to be done:
http://www.java-gaming.org/forums/index.php?topic=11227.0

I did a little benchmark myself and indeed seems to be the case (using objects for model/poly’s/vertices was more than twice as fast as using a float[][][]). Ofcourse this would mean a lot of little objects will be created, but this seems to contradict with this post:
http://www.java-gaming.org/forums/index.php?topic=14940.0

Also mapped objects are mentioned in this forum, but I couldnt make out if theyre better than the object approach, as its mentioned somewhere that its a bit slower than using a float[] :
http://www.java-gaming.org/forums/index.php?topic=12886.0
http://www.java-gaming.org/forums/index.php?topic=15662.0

So performance wise the object approach (a class for model, poly[] and vertice[]) would be fastest?

And last: As i’ll be doing a lot of vector operations, using SIMD instructions would be very welcome. I read the JVM will use SSE instructions where possible, but not very efficient. I found these threads with idea’s of using a little JNI library to handle that:
http://www.java-gaming.org/forums/index.php?topic=12346.15
http://www.java-gaming.org/forums/index.php?topic=17199.msg135213#msg135213

Anyone ever get sopme results from that? Is it feasable?

Thanks!

Martijn

bitshit · December 20, 2007, 1:16pm

Anyone??

Riven??

Riven · December 20, 2007, 2:59pm

Sorry, I missed your post.

I’ll respond when I’m back from school.

Riven · December 21, 2007, 7:29pm

Let me start that the thread:
“Why mapped objects (a.k.a structs) aren’t the ultimate (performance) solution”

Should be pretty much ignored, as the author doesn’t understand the topic quite well, yet has a very strong opinion.

Basicly this is the deal:

As you’re pushing for performance, simply ignore the ClientVM.
I’ll only discuss what to do on the Hotspot Server VM.

[*] stay away from nested arrays
_{these are references to references to offsets to data (tiny objects, lots of overhead)

in C, a float[][][] will be 1 chunk of RAM, not so in Java}

[*] stay away from Buffers, for as much as possible
_{You’ll lose 10% to 30% performance on average. but that might be acceptable}

[*] NEVER use array-backed Buffers, at ANY TIME
_{Having 1 array-backed FloatBuffer around will cause a drop of direct-FloatBuffer by [u]factor 10}

[*] as you’re manipulating data meant for the graphics-card, don’t put your data in tiny objects.
_{you’ll need to copy data from all over the place, into your FloatBuffers}

[*] float[] performance is predictable, you don’t need to fine-tune your sourcecode by trial and error, to find the sweetspot.
_{FloatBuffer performance is VERY unpredictable, you’ll find yourself optimizing for a very specific JRE version (the one installed), and by tiny changes

in the ordering of your calculations, you’ll ‘randomly’ see performance drop or raise by a few dozen percent.

One extreme case: fb.get(i+0) was 40% faster than fb.get(i) for me in a certain case.}

[*] when using Buffers, never use get() and put(), only get(i) and put(i), even if it makes your code a lot bigger
_{You’ll get a massive performance gain, like factor 2-3x IIRC}

[*] if the calculations take ‘enough’ time, do all your math in float[] and copy the result to a direct-FloatBuffer before pushing it to the graphics-card.
_{otherwise do your calculations on a FloatBuffer, but again, never ever instantiate a float[]-backed one}

[*] last but not least, don’t waste your time on sun.misc.Unsafe - so… there are no mapped-objects (to be faked).

In any non-bottlenecks, use proper OO and tiny objects to keep it all managable.
Don’t put any effort in optimizing where it’s not structly required, it will only slow you down up to the point you ditch your demo

lhkbob · December 22, 2007, 6:59pm

Are you saying that float[] actually performs better than FloatBuffers? or am I just reading it wrong. When you say to stay away from Buffers, do you mean having variables declared with their type as the abstract Buffer, or do you mean to stay away from any Buffer subclass in general?

Riven · December 22, 2007, 8:15pm

float[] is almost always faster than a direct, native-ordered FloatBuffer.

FloatBuffers are very rarely as fast as float[] (only in a trivial copying-loop).

try to ‘stay away from subclasses of Buffer’, if you want predictable behaviour and guaranteed performance.

What if a 3rd-party API makes 1 silly nondirect-FloatBuffer? Then you’d be screwed bigtime, and can as well
ditch all your FloatBuffer code that requires performance.

bitshit · December 24, 2007, 10:19am

Thanks for your insights Riven!

But i believe mapped objects have been suggested by others too?

This is no problem for my demo, as i can force it to start with the server vm… but will this change in the future? Kinda weird why they dont include these optimalisations in the client vm aswell

In any non-bottlenecks, use proper OO and tiny objects to keep it all managable.
Don’t put any effort in optimizing where it’s not structly required, it will only slow you down up to the point you ditch your demo
[/quote]
Well it more kinda all or nothing… (cant optimize for a specific part) as i plan to develop this as a general lin for future development, so i have to decide whether to use a clean OO model (objects for everything). But I guess im sticking to one object per model with a float[] for its polys and vertices. As the models can only consist of triangles i can allocate a big float for every object (float[nr_triangles*3])

princec · December 25, 2007, 6:25pm

Not that weird - the optimisations are time consuming and use a bunch more RAM.

All to be fixed in Java 7 with the tiered compiler. Yay!

Cas

SluX · January 8, 2008, 3:02pm

So if i m getting it right, u suggest that :


class Synapse{
   float weights[];
}

class Layer{
   Synapse synapses[];
}

would be faster to work with rather then


class Layer{
  float weights[][];
}

I m talking about neuraln network…i hope that exapmle isnt too confusing…

Riven · January 8, 2008, 7:58pm

Nah, it’s even a bit slower (in theory)… there is one more reference to lookup.

It’s fastest to use 1 float[] with all data in it.

float weight = weightsInSynapses[synapsesInLayer * layerIndex + synapsIndex];

You can optimize this further and further, but it’ll get harder very quickly.

Just make it work with proper OO first. And if you replace it, keep the slow code around for reference.

Linuxhippy · January 15, 2008, 12:12pm

I read a bit in recently fixed bugs/rfe’s and it seems that at least for the server-compiler a lot of tuning has been done for ByteBuffer’s (in JDK6u4 and JDK7).
Anybody willed to re-run the benchmarks?

lg Clemens