Overhead of 'local' allocations?

Hi community!

When I was young and used to code in C++, I loved the possibility to allocate objects locally on the stack. E.g.

CMatrix temp( getPosition() );

In Java, the direct counterpart would be

Matrix4f temp = new Matrix4f( getPosition() );

Well, there is the bad word ‘new’ in it. I’m afraid to create garbage by that. An alternativ is to hold a and reuse a preallocated instance:


private final Matrix4f mTemp = new Matrix4f();


...

    mTemp.set( getPosition() );

But this is ugly, not threadsafe and consumes memory that is only temporarily used.

Now, how big is the overhead of that special kind of ‘new’ for temporary objects? Can the compiler or the JVM determine that it’s local, allocate it on the stack and remove it quickly when the scope is left?

I read some things about HotSpot dealing with different kinds of objects concerning their lifetimes, but I’m not ready to understand it and apply it to this case. Anybody to help me out?

[quote]Can the compiler or the JVM determine that it’s local, allocate it on the stack and remove it quickly when the scope is left?
[/quote]
That determination is known as escape analysis but I do not think the current JVM from Sun implements that technique. I have read somewhere that some JVMs (possibly JET) already implement simple escape analysis though, so hopefully Sun’s VM team can catch up.

[quote]Now, how big is the overhead of that special kind of ‘new’ for temporary objects? Can the compiler or the JVM determine that it’s local, allocate it on the stack and remove it quickly when the scope is left?

I read some things about HotSpot dealing with different kinds of objects concerning their lifetimes, but I’m not ready to understand it and apply it to this case. Anybody to help me out?
[/quote]
Check out section 7.6 in Jeff’s book.

There is also quite a bit of discussion on this in the old forum but it isn’t very organized. I’d start with a search for ‘nursery’.

Escape analysis is the way to go. My bet is that it’s going to be in 1.5.

My suspicion is that 90% of the garbage created in properly written Java applications (ie. not the nasty hacks we game-types are being forced to use) is stack-allocatable and therefore should incur zero garbage collection penalty and have 100% deterministic allocation time. JET uses this technique to great effect although I’ve yet to get some GC stats out of it (and besides, I’ve eliminated all my allocations now… :-X

For now you should use the horrible preallocation hack because no matter how clever you are with the -X flags you still get unexpected (or worse, regular) pauses and it looks shite.

A side effect is that you should probably avoid using any of the other useful APIs in Java in your rendering code too as these have a nasty tendency of creating tons of garbage in the hope that one day something will come along and make it cost-free…

Cas :slight_smile:

[quote]A side effect is that you should probably avoid using any of the other useful APIs in Java in your rendering code too as these have a nasty tendency of creating tons of garbage in the hope that one day something will come along and make it cost-free…
[/quote]
… yes, but I actually HAVE them anyway, not only rendering but also networking, GUI, logging,… where I suspect many more object are allocated/discarded than I do myself.

So I’m particularly interested in the fact how local objects behave in relation to real heap objects in respect to performance.

Now I learned that obviously they have to be treated by the GC.
I once read Jeffs (and Steve Wilsons, not to forget) book and the chapter mentioned, but it was based on Java 1.3. Or even 1.2? Have things changed over time?

Yep, things have changed - we’ve now got the concurrent collector (-Xconcgc), which is the major difference between 1.3 and 1.4.

In theory it uses idle cycles to do bits of garbage collection. In practice it’s really only a great deal of use when there’s a spare processor where it really helps.

It’s still no match for escape analysis.

Most of the optimisations for the Sun GC are to do with more common Java usage which is running big servers with massive heaps and lots of turmoil and in running client Swing GUIs which spend a lot of time sitting relatively idle. Both situations are somewhat different to what we generally want for games.

Cas :slight_smile:

In theory it uses idle cycles to do bits of garbage
collection. In practice it’s really only a great deal of use
when there’s a spare processor where it really helps.

I wonder how Intel’s “Hyperthreading” chips perform with concurrent gc. Anybody every try it?

It all depends on how the JVM uses the L1 & L2 caches when it’s doing GC.

Cas :slight_smile: