New VM performance improvements

Operations on streams can be very cool too ( and subject to massive optimizations using MMX, 3DNow, SSE, SSE2, SSE3, etc ).


void add( int value, int[] stream, int offset, int length );
void mul( float factor, float[] stream, int offset, int length );
void xor( short mask, short[] stream, int offset, int length );
void lshift( int amount, int[] stream, int offset, int length );
void clamp( byte lbound, byte ubound, byte[] stream, int offset, int length );
...

That’s why we need an SSE class that can be intrinsified on platforms that support it natively or simply implemented in tight machine code otherwise. We had something on those lines in LWJGL in the early days but, er, as we didn’t know shit from shinobi we took it out :wink:

Cas :slight_smile:

I like the idea of an SSE library, may be someone here with strong competences could start a new project on java.net

Unfortunately, it won’t be me, as i don’t feel strong enough in that domain.

I like the write once paradigm of java, but there are times where it’s faster to develop a multiplatform performance library than to rely on hotspot optimizations. For application using JO/LWJ GL , which already requiere .dll and .so to be bundled with, another tiny native lib compatible with the same platform base would not be harmful…

Lilian

[quote]Azeem, that’s a poor reason for not implementing sincos. If sincos is an instruction supported on 90% of the world’s computers then you may as well add it for those that can gain performance from it. For the remaining machines, it won’t make any difference.
[/quote]
Except I have no control over what can go into the libraries. I could file a bug and hope somebody over there would find it interesting enough to add a Math.sincos() and then I could intrinsify it. Or alternatively I could add a very contrived match that checked for a sin and cos on the same variable right next to each other. Hmmm actually that might not be hard, but its very specialized.

Oh and Ragher check out System.gc()
:slight_smile:

[quote]Hi guys,
I’m open for suggestions to other improvements that might help.
[/quote]
If you’re still interested, email me on adam at grexengine.com.

I’ve been gathering complaints from C++ games devs, and whittling out the few that are dependent on fundamental ops that java lacks. Personally, I’m not sure how valid many of them are - I’m happy to discuss them with you via email, and then go request more info from the individual devs wherever you want more detail / evidence / etc.

[quote]Oh and Ragher check out System.gc()
[/quote]
One size fits them all…
Used it and think it could be much better. Especially if it would look for methods named clean.

BTW where is System.gc(long)?

Deadcow:

[quote] What about adding “clamp” methods to Math ?
[/quote]
I did it already. It’s in my image library, with few conversion rutines like RGBtoR … and some others.

Clamp is programming function not math function. If you’d like to add something to the math it could be FFT, mem intensive FFT, DCT, IDCT, matrix operations, and possibly something helpful.

I bet that once escape analysis comes along that no-one will be worried about GC any more.

Cas :slight_smile:

Neither max nor min are actually “math functions” and they can be relpaced by


// max
a > b ? a : b;
// min
a > b ? b : a;

They exist for sake of clarity ( and possible optimization ), as clamp should.

Actually they are:


return (a <= b) ? a : b;
return (a >= b) ? a : b;

And in the case of doubles there is some NaN handling and what not.

If we are talking about new APIs in the Math area, or just certain calculations to recognise and optimize… something that converts between polar coordinates and cartesian coordinates would be useful, since the current Math.tan methods only return part of the answer, but usually have calculated more… I mean the angle and radius are known internally, but only the angle is returned, and so a duplicate square root calculation might happen. Stuff like that can help in various 2D and 3D games.
There was a discussion about this on Apple’s performance tuning email lists recently that I found interesting.

Re: the clamping functions…

When using software loops to blend RGB image data (with Alpha), vector instructions would offer a HUGE advantage. Is there any way the compiler could recognize those types of loops? Though, I must admit it isn’t worth it at all if the graphics processor can be used to accelerate the operation entirely… so it is probably best for the Java2D guys to take care of that at a higher level.

So I guess the general purpose API to gain access to vector operations like what Cas was saying is really more important.

I found this.
http://www.virtualdub.org/blog/pivot/entry.php?id=21

Of course Prescotts differs from previou CPUs be increased heat, and somewhat rewamped ALUs.

Bad latencies between SSE2 and rest of the CPU could make some algorithms faster on “general” purpose registers. Of course you could say this doesn’t matter they will repair it in next generation, and add a lot of other problems.

[quote]Operations on streams can be very cool too ( and subject to massive optimizations using MMX, 3DNow, SSE, SSE2, SSE3, etc ).


void add( int value, int[] stream, int offset, int length );
void mul( float factor, float[] stream, int offset, int length );
void xor( short mask, short[] stream, int offset, int length );
void lshift( int amount, int[] stream, int offset, int length );
void clamp( byte lbound, byte ubound, byte[] stream, int offset, int length );
...

[/quote]
Ah yes, the wonderful MMX, where 1*1 = 0.5

I would love to be able to use SIMD from java, though.