need for speed, java vs c++ and the stack

Java needs to be faster, the question is how. I have a suggestion:

Allow objects to be created on the stack (static storage) for functions. This means that you don’t need to allocate and initialize all the objects on the heap every time you call a function since their there when the program starts. theres another good advantage to this, garbage collection occurs when objects are created, this means that the less you use the ‘new’ keyword to allocate/instantiate the object the less the GC gets called. As a result, code execution:

  1. Is faster
  2. Does not call GC as much
  3. Does not use as much heap space. The savings are usualy not that great, but possibly huge if your using many tables.

is static storage at all possible in the future for Java?

Yes, it is done through the use of escape analysis. I found two RFE’s for it:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4809540

and

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4453604

Although the evaluation make it sound like it’s not very high on the priority list (and one of the RFE’s is closed).

  • elias

[quote]Java needs to be faster, the question is how.
[/quote]
If you change the need bit to could be I would agree with you. This obsession with speed when it comes to games just goes too far some times. As it stands Java performance is acceptable, and there is no driving need for better performance. Performance improvements are always welcome, of course.

I disagree. Point for point:

  1. Regarding allocations, it is because of the GC that heap allocation in Java is actually faster than it is in C++. Where stack vs. heap allocations may show a benefit in C++, there would likely be little benefit in Java.

When it comes to acessing variables, you could argue that stack memory is less likely to cause cache misses (which is a big reason why accessing stack variables is can be more efficient). But as a counter one could argue that there’s no safe way to protect the stack from being overwritten (which would seem to go against the tenets of Java).

Regardless, there are well known practices to follow when coding in Java to avoid problems with excessive allocation. And there are excellent tools that can help you solve heap/GC related issues.

Take a look at this research paper, Memory System Performance of Programs with Intensive Heap Allocation.

  1. Secondly, you want the GC to be called often. The more often it is called, the less likely a GC-related pause.

  2. If you find that you are using too much heap, switching some of it to the stack isn’t going to help you. Weak references could come in handy in this case.

Ultimately, I see this sort of thing as a micro-optimization at best. At worst, it’s an attempt to fix a problem which doesn’t exist.

Why can’t you protect the stack? I see no reason that on-stack allocation would weaken the stack protection.

  • elias

No, no it doesn’t weaken stack protection. The stack is already weakly protected. When you consider that the stack is constantly being written to, that the stack pointer (esp) is constantly being moved around, and that any old piece of assembly can manipulate the stack pointer directly… it is inherently unsafe.

I’m not saying it’s a big issue, but a possible counter argument. I only mentioned it as it was the first thing to pop into my head (I read a debate recently where a couple of people advocated avoiding stack allocations altogether even in C++ programs) and it seemed relevant.

Escape analysis is excellent and my experiences with Jet show that it really does help in a lot of situations. It is also, ultimately, faster than any new/GC algorithm can be, if only be a little weeny bit.

Structs would probably be a better place for games devs to vote their efforts on though, as it’s got a more direct impact on how we write our programs.

Cas :slight_smile:

/me thinks Cas should change his sig to be “Structs-Man”

Or possibly “Super-Structs!”

[quote]2) Secondly, you want the GC to be called often. The more often it is called, the less likely a GC-related pause.
[/quote]
A GC related pause would be caused by too many objects on the heap, if your not allocating them on the heap, how will you get a GC related pause? I could imagen a function declared as stack function or something like that, to tell the RTI to turn off the GC for the length of the function.

If your doing 3D rotation matix operations at 70 fps with 100000 3 vertex polygons, it might make a difference, because the function for rotation would be called for each vertex, in this case 300000 times for all vertices per fps, wouldn’t it be wasteful to use the GC 300000 * 70 times per second each time your making in instance of matrix inside your function? You could say that a solution would be to create the matrix as a static instance, but isn’t this a bad design because your using code farther from where it should be used? placing the stack instance inside a function would be more clear from a client programmers perspective.

How do you make sure that this stack instance isn’t passed to a place where it is stored and has to live longer than the function call?

I’m not sure if I understand what you are saying. In what situation does this happend? And why would it be bad?

[quote] If your doing 3D rotation matix operations at 70 fps with 100000 3 vertex polygons, it might make a difference, because the function for rotation would be called for each vertex, in this case 300000 times for all vertices per fps, wouldn’t it be wasteful to use the GC 300000 * 70 times per second each time your making in instance of matrix inside your function?
[/quote]
If your entire argument for stack-based objects is based on this reasoning, then you’ve got an almost impossible struggle ahead of you.

That example smacks of inexperienced programming. Rule number one of realtime programming is to not allocate and free objects. It doesn’t matter what language you’re using, that’s just really bad design and a major performance loss. You create as many global objects as you need at startup and never play with them again. Now, you don’t have the overhead either way of allocating objects or of running the GC/free over them to reclaim memory.

For example, we’ve just debugged a DIS implementation from a programmer that used exactly your approach. For every packet, there were a number of objects created (somewhere between 20 and 50 depending on the packet). After stripping out all this code and turning into a collection of variables, we got almost 2 orders of magnitude speed improvement - ie from about 50 entities a second to just under 5K handled. What’s more, the rendering now became liquid smooth because we no longer had to deal with GC pauses dealing with all these sorts of short-term created objects. Just because you could place these objects on a stack, really doesn’t alter the fundamental principle of having to spend CPU cycles both creating and cleaning them up afterwards, regardless of where they are created.

[quote] Just because you could place these objects on a stack, really doesn’t alter the fundamental principle of having to spend CPU cycles both creating and cleaning them up afterwards, regardless of where they are created.
[/quote]
That’s wrong, if an object is purely stack based (and the constructor empty), you don’t waste any time creating it (it’s already there when you enter the function and the stack frame is set up) or free’ing it (it’s gone with the stack frame). And the most important argument is that you get to loose all those ugly static global variables you spoke about, and move them into the functions that need them. So while it might be bad programming to do the matrix stuff Thinking speaks about today, it’s only so because of not having stack based allocation in the JVM.

Herkules: That’s what escape analysis is for. You analyze which objects are never escaping from the function, and only pick them for potential stack allocation. Any object that can possibly escape (and outlive the function as you put it) have to be created on the heap.

  • elias

Elias is correct.

Cas :slight_smile:

Thanks for the replies everyone. Elias, I can’t view those pages at the moment because I don’t yet have a account on java.sun.com. This escape analyisis sounds interesting, is it an elegant technique? If I can’t use the stack for creating objects that never leave the scope of the function, should I make these objects private static?
BTW, what’s an RFE?

[quote]Thanks for the replies everyone. Elias, I can’t view those pages at the moment because I don’t yet have a account on java.sun.com. This escape analyisis sounds interesting, is it an elegant technique? If I can’t use the stack for creating objects that never leave the scope of the function, should I make these objects private static?
BTW, what’s an RFE?
[/quote]
I don’t know the exact algorithm(s) to do escape analysis, but the main point of them is to track any allocation to see if the resulting object is ever used in a function call (inlined and recursive calls don’t count) or returned from the function, and probably other cases I have overlooked.

And yes, private static (and probably final) is the way to go if your app is singlethreaded. Ugly, but necessary :confused:

An RFE is a “Request For Enhancement”.

  • elias

Ggogle for escape analysis, particularly look out for university CS courses. It’s simple and elegant, but you will probably get confronted by mathematical notation quite quickly (because that lets you explain it more concisely).

[quote] That’s wrong, if an object is purely stack based (and the constructor empty), you don’t waste any time creating it (it’s already there when you enter the function and the stack frame is set up) or free’ing it (it’s gone with the stack frame). And the most important argument is that you get to loose all those ugly static global variables you spoke about, and move them into the functions that need them.
[/quote]
If the objects are “already there” how do you think they got there? Did they just magically appear? No, of course not. There still needs to be CPU time spent actually creating those objects on the stack, initialising variables inside them and so forth. It still does not change the unalterable fact that some CPU time must be spent creating and destroying those objects, regardless of whether they are on the stack or heap.

Allocation of objects was never the problem. It is the deallocation of those objects. It takes an awful lot of work tuning the GC params for a particular application to make sure that all these little weeny objects get zapped in a timely fashion and without incurring a glitch. Escape analysis just makes this tuning job vastly easier by removing the vast majority of little allocations/deallocations, and it also helps a lot with code clarity by removing the temptation to stick loads of private static final Vector3fs in classes all over the place.

Not to mention that particular code practise is lethal in multithreaded programming…

As another consideration we must remember that GC becomes orders of magnitude more complex and nasty and slow the more threads and processors we’ve got. Anything to prevent things getting into that code path in the first place will be a boon. I expect escape analysis may eventually have its most profound performance impact serverside but we’ll have to wait and see.

I understand some of the Sun VM engineers are particularly keen to implement it as it’s quite an easy performance tweak to implement relative to many of the other crazier ones.

Cas :slight_smile:

to Thinking
Who told you that stack is faster?

look at this code:

push eax
pop eax

and compare it to this:

mov bssSegmentVariableA, eax
mov eax, bssSegmentVariableA

What would be faster?

basically push would be:
mov [esp], eax
sub esp, 4
(It would be possibly slightly optimized, hopefully, you never know with current industry.)

I agree that more smart interaction to GC and memory allocation from the application would be kawaii, but I doubt a stack overflow (actually underflow, it counts backward.) would be nice.

[quote] 70 fps with 100000 3 vertex polygons,
[/quote]
3 vertex polygon is often called triangle. So we would have a 100000 triangles doing a some rotation at 70 FPS. 7 Mil triangles? Buee my Riva TNT2 is too slow.
And now seriously. Xith3D demo had around 14 000 triangles/polygons in one frame, it looked nicely. If you will be thinking about 100000 triangles you would be attempting to do a race with the Square soft for the most inefficient graphic engine in a seriously released game. For your information, you’d lose. They have PS2 and they would have PS3. :slight_smile: (Brute force 30 FPS on PS2. 2 chips specialized on rendering.)
Actually why I said seriously?

to Elias

[quote] And the most important argument is that you get to loose all those ugly static global variables you spoke about, and move them into the functions that need them.
[/quote]
“My sister was bitten by a loose.” ~_^
About what static variables are you talking? Object re-usage, and helper objects are efficient way to deal with this problems. Look at my example at the start of this post, you’d need to emulate something like that.
Of course if you can change stack’s size at runtime, you might have a luxury of object allocation at a 2 MB big stack.

to Cas
So you think we would need to hint GC more than just by System.gc(); I agree, some additional informations to GC would be nice, at the runtime. Also better application controlled memory allocation is necessary. (I didn’t tried 1.5 so I don’t know if -XmxmemoryUnlimited was added as a default.)

to Raghar
I’m not too experienced with 3D graphics, so my example wasn’t that realistic. But I was trying to show that the stack could be quite useful in algorithms, it could optimise the functions in a clear way, and might be a better alternative (in certain cases) to creating objects as static globals.
BTW assembly ga wakarimasen “I don’t know assembly”.

If you allocated a object on the stack at the beggining of a program, would it persist until the end of the program inside the stack frame? and does it keep it’s state between calls to the function without reinitialization?