I’m still not clear on why c++ isn’t an option. Are you all just pathologically afraid of it or something? You don’t have to use all the weird language features…
[quote]To allow 95% of work to be done on GPU, rest 5% has to run fast.
[/quote]
That’s an interesting statement. One that also requires some qualification. If the 5% of the CPU code runs in less than 1/60 of a second, then it doesn’t matter how slow Kaffe is.
[quote]AFAIK, kaffe has next to none optimalization for ByteBuffers at the moment. From what I can see from quick browse of code, on EVERY get/put, bounds are checked, JNI method is called and inside this JNI method, another JNI method is called to get value of pointer field - only then it is dereferenced and returned to caller.
[/quote]
I was looking at the same thing. The problem is that the Java code comes from the GNU Classpath project. It’s up to the Kaffe VM to recognize ByteBuffers as a special circumstance where no bounds checks are needed. And I’m not about to dig through all of Kaffe looking for it. It would be much easier just to profile it.
Even if the optimizations don’t exist, it’s probably not a big deal. Once the textures are committed to the GL Pipeline, then you only have to worry about shunting vertexes. If Cas used massive triangle fans, then I’d worry. But he doesn’t, so the performance probably is not significantly different. The big thing is that the NIO support exists. Without it, LWJGL won’t compile.
[quote]This can be easily hundreds times slower than HotSpot implementation. Even if 1 from 5 percent above is dependent on buffer speed, slowing buffer access 200 times would mean that CPU part will take 66% of time instead of 5% - slowing everything 3 times.
[/quote]
Ummm… 5% of program execution time != 5% of main CPU time. If his code executes at an effective 100MHz, then it’s still faster than most of the previous generation consoles. Remember, he’s just doing 2D + some effects. Not exactly straining for more horsepower. (Of course, future efforts might. But then you need to profile!)
[quote]Of course, only way to be sure is to check this. But it is not just a problem of kaffe supporting nio. It is a problem of kaffe supporting DirectByteBuffer deep inside jit machine, optimizing it as much as possible. Two JNI calls per every float get/put is just not acceptable for anything even remotely real time - unless you transfer entire arrays at once. But Buffers are created to avoid working on java arrays…
[/quote]
Is there somewhere I’m missing something? From what I can see, he doesn’t have any triangle fans or vertex arrays. In fact, he’s probably using a set of vertex calls or glRect() calls. No ByteBuffers required. The only time he would need to worry is when he sends the textures to the card. Since that’s a one time operation, any slowdown should be unnoticable.
Now with all that out of the way, let me mention that Kaffe does have a JIT compiler for certain platforms. So it’s imperative that you profile the target platform to see if it will work for you.
I don’t want to use C++ because of all the problems it has. It’s fiddly and complicated, even without using the sharp pointy bits. It always takes much longer to write anything nontrivial in it. I’m quite able to write in C++ obviously, I just abandoned it for a good reason, and that’s safety, ease, stability and productivity.
The SciTech drivers are just an awesome bit of coolness. I tried them a few years ago and they were unstable as hell but if they can work on the XBox, then with Jet in tow, I’ve got a clear path to the console market! Result! I’ll ask a couple of Xbox dev friends of mine about getting hold of the dev kit, because I reckon all my games would be great Xbox Live candidates.
Also - to get 60fps in Alien Flux I use about 500-600MHz flat out on a Hotspot 1.4 VM. Kaffe is at least 3x slower and I’m afraid that’d be completely unacceptable.
Cas
I know it has a jit. At the same time, I’m 95% sure that jit has not idea about ByteBuffers - I’m trying to keep a hand on kaffe development (even if only by browsing CVS commits/changelogs) and would probably notice something like this.
I agree with you, that for specific games Cas has, ByteBuffer performance is probably not critical. I have forgotten about context and started talking about more generic case, where you need to do some CLOD/skin/bone/etc on CPU - where buffer performance would be certainly a major issue.
Run Alien Flux with -Xint -Xprof to get an idea of what Kaffe performance would be like and see where the bottlnecks lie…
Ah, and you’ll be unhappy to discover that Flux rendering is extremely complex and very performance intensive.
To save you the bother, on a 2GHz laptop with a Gf4Go in it, running maxed out I get 10fps:
Flat profile of 398.68 secs (33440 total ticks): main
Interpreted + native Method
13.1% 4363 + 0 com.shavenpuppy.jglib.algorithms.RadixSort.sort
6.0% 2011 + 0 java.nio.Buffer.nextPutIndex
3.6% 1217 + 0 com.shavenpuppy.jglib.sprites.SpriteRenderer.writeSpriteToBuffer
3.2% 1063 + 0 com.shavenpuppy.jglib.algorithms.RadixSort.sort
2.7% 909 + 0 xap.effects.WobblyPlaneEffect$WobblyPlaneEffectInstance.wibble
2.1% 688 + 0 java.nio.DirectByteBuffer.put
2.0% 673 + 0 com.shavenpuppy.jglib.geometry.Plane.distanceTo
2.0% 0 + 659 sun.misc.Unsafe.putByte
1.7% 560 + 0 com.shavenpuppy.jglib.resources.ColorSequenceResource.getColor
1.7% 556 + 0 java.nio.DirectByteBuffer.ix
1.7% 552 + 0 java.nio.DirectFloatBufferU.put
1.6% 534 + 0 com.shavenpuppy.jglib.geometry.Box.classify
1.6% 525 + 0 java.util.Arrays.fill
1.5% 0 + 515 sun.misc.Unsafe.putFloat
1.5% 505 + 0 java.nio.Buffer.position
1.5% 487 + 0 java.nio.DirectFloatBufferU.ix
1.4% 475 + 0 java.nio.Buffer.nextGetIndex
1.3% 426 + 0 com.shavenpuppy.jglib.renderer.Renderer.sort
1.3% 425 + 0 com.shavenpuppy.jglib.geometry.Plane.classify
1.1% 362 + 0 com.shavenpuppy.jglib.renderer.Renderer.build
1.0% 345 + 0 xap.effects.WobblyPlaneEffect$WobblyPlaneEffectInstance.wobble
1.0% 0 + 325 org.lwjgl.opengl.Win32Display.swapBuffers
0.9% 308 + 0 java.util.Arrays.fill
0.9% 306 + 0 com.shavenpuppy.jglib.Image.mergePlanes
0.9% 305 + 0 org.lwjgl.util.Color.setColor
100.0% 29023 + 4363 Total interpreted (including elided)
Thread-local ticks:
0.1% 46 Blocked (of total)
0.0% 7 Class loader
0.0% 1 Unknown: no last frame
Global summary of 398.68 seconds:
100.0% 33492 Received ticks
0.1% 50 Received GC ticks
0.0% 2 Other VM operations
0.0% 7 Class loader
0.0% 1 Unknown code
Notice that outside of radix sorting, the vast majority of the time is spent in shoving data into bytebuffers… 19% of CPU time in fact. The game code itself doesn’t even get a look in
Cas
cough
Troll.
[quote]Run Alien Flux with -Xint -Xprof to get an idea of what Kaffe performance would be like and see where the bottlnecks lie…
[/quote]
Actually, that may not be a very good indicator. Kaffe does have a JIT, so it should perform somewhat faster than -Xint.
[quote]Ah, and you’ll be unhappy to discover that Flux rendering is extremely complex and very performance intensive. Notice that outside of radix sorting, the vast majority of the time is spent in shoving data into bytebuffers… 19% of CPU time in fact. The game code itself doesn’t even get a look in
[/quote]
- What is the radix sorting code doing?
- What in blazes are you doing that requires ByteBuffers in every frame?
Inquiring minds want to know!
[quote]To save you the bother, on a 2GHz laptop with a Gf4Go in it, running maxed out I get 10fps:
[/quote]
I got similar results on my PIII 733. It wasn’t that the game couldn’t be playable at that speed, however, it would just need some tweaks to the movement rate per frame.
The radix sorter is sorting primitives for rendering and sprites. Some of it is actually overkill but as it represents very little CPU work in the JITed version I left it alone.
The buffers are filled every frame with up to 4000 odd sprites and particles.
Cas
[quote]The radix sorter is sorting primitives for rendering and sprites. Some of it is actually overkill but as it represents very little CPU work in the JITed version I left it alone.
[/quote]
Hmmm… overkill indeed. I’m thinking a simple “render sprites in whatever order, then lasers, then render particles” would have sufficed. But maybe that’s just me.
[quote]The buffers are filled every frame with up to 4000 odd sprites and particles.
[/quote]
I tried to download the source to see this, but the source code seems to have gone missing. In any case, I’m still wondering if that’s really going to be a performance problem. If I get Kaffe compiled, I’ll have to give it a try and see.
'snot really overkill… there’s a lot more going on than meets the eye. Sprites are first sorted by layer, then Y coordinate, then texture, then rendering style. And there are an awful lot of them.
Even the background goes through a triangle pumping engine that sorts all the triangles in order of rendering style and adjacency. That’s the overkill bit, but there are very few triangles involved in the background relatively.
Cas
Cas: Be sure to let us know if you get your games to run on an XBox with JET and that graphics driver package.
With all the three-letter companies recently releasing hundreds of patents to open source development, one can always hope that there is some truth to the rumors at Javalobby of one of the companies putting out an open source JVM, but I guess that’s a very very long shot…
William
Emails have been sent, contacts have been made. I’m going to pursue this idea until I get a conclusion. Xbox live here we come!
Cas
[quote]Emails have been sent, contacts have been made. I’m going to pursue this idea until I get a conclusion. Xbox live here we come!
[/quote]
grin See what happens when you set your mind to something?
There is another possible option: the IKVM project (http://www.ikvm.net/). It is a JVM for .Net and also includes a bytecode to .Net CIL compiler. So you might be able to have your cake and eat it.
played a little with Mono runtime sources, and it’s very easy to make your custom runtime, removing useless craps like Windows.Forms and other useless stuff. It’s easy to cleaning it (as sorting crap out of rt.jar and rest) and the license allow it.
So you can prolly emmbed our own Mono runtime for ~5Mb. NOW WHAT SUN IS WAITING FOR LET US FORGE EMBEDED JRE ?
FWIW, I think for now distribution with an embedded VM is the best way, especially because almost none of my games work anymore on 1.5 >:(
I’m currently being flooded by emails from especially JEmu users that most games stopped working since they upgraded their JVM.
[quote]FWIW, I think for now distribution with an embedded VM is the best way, especially because almost none of my games work anymore on 1.5 >:(
I’m currently being flooded by emails from especially JEmu users that most games stopped working since they upgraded their JVM.
[/quote]
What was incopatibile between these two versions? My programs worked without problems.
I haven’t checked with all my games (and sound lib) why they don’t work anymore, but something definitely has changed with javasound, maybe input, and XML. I probably did something wrong with Cosmic Trip in the way I included the XML parser since the XML packages I included simply are not there anymore. With JEmu, I’m not sure why about 80% of the games just hang. I don’t think it has anything to do with sound because also games with no sound stopped working (and btw sound already stopped working since 1.4.something). With JEmu2, there was a problem with sound initialization which I had to change to make it work again. With CottAGE, I’m not sure what’s wrong. For some, input doesn’t work anymore, for others sound doesn’t work anymore, for some it stopped working at all. On my machine, it still kinda works except that it starts running really slow after a while.
I’m not sure why my SoftSynth lib doesn’t work, probably a sound init issue too.
Fortunately, the programs I made for my job still work! (phew!)
How odd. Fortunately my LWJGL code all still works fine under 1.5! ;D
Mine too.
Cas