New VM performance improvements

Spasi · November 3, 2004, 12:17pm

[quote]It would only help a limited set of code, and you’d have to write the code to a very narrow range
[/quote]
We’d appreciate a document describing which those narrow ranges are/will be.

PS: Thanks for your efforts and for keeping us informed.

ajiva · November 3, 2004, 1:58pm

Well from what I understand SIMD only works on sets of similar data. I’m not an expert at all, but from what I understand SIMD lets you do the same operation on different data (Single Instruction Multiple Data), that much I know from my computer science classes a long time ago. But I’m unsure how Intel implemented this in SSE/SSE2. I’m still researching this Again, if anyone has good documentation on SSE or a recommendation for a book I’m listening!

mthornton · November 3, 2004, 5:03pm

How about this for a start:
http://www.cortstratton.org/articles/OptimizingForSSE.php

https://shale.intel.com/SoftwareCollege/CourseDetails.asp?courseID=23

crystalsquid · November 3, 2004, 5:10pm

I’m sure if you ask Intel they will be forthcoming. THey do some very nice docs as well as training courses several times a year.

From my previous experience with it, the data types must be the same for all bits, and unless you pre-format the data into something suitable, you end up using as many instructions to shuffle/packthe data into the form you need in SSE than it would save

For example, a matrix multiply operation is not much faster in SSE because you have to transpose a matrix which takes quite a few cycles to do.

princec · November 3, 2004, 6:53pm

Specifically, it’s matrix4 and vector4 operations that need SIMD acceleration mostly as the highest priority, so it might be rather useful if these made it into the Java language as primitives and then got intrinsified.

After that it’s sound and videio decoding are the main uses, and signal processing.

Cas

mithrandir · November 4, 2004, 12:28pm

You don’t even need to go that far. If we could get a version of the javax.vecmath package that used native code for everything, that would make a huge performance difference to a lot of applications.

Spasi · November 4, 2004, 12:49pm

[quote]If we could get a version of the javax.vecmath package that used native code for everything, that would make a huge performance difference to a lot of applications.
[/quote]
That would indeed be a quick’n’dirty solution (and would help lots of apps), but no, I’d hate that. It’s a bad API and a bad implementation. The effort should be better spent on a more generic solution.

I’m not sure about primitive types either. I’d prefer it if the VM could analyze the code and optimize on any possible opportunity (even non-vecmath code). I do realize though how difficult that is…

blahblahblahh · November 4, 2004, 4:33pm

It’s a heck of a lot less bad than generics, and for an awful lot of applications more valuable (especially if they were to do a little bit of updating and unify vecmath a little with J2D’s Point classes etc).

No disagreement on the implementation - but there are better impls available already for free.

/me ducks and runs for cover

swpalmer · November 6, 2004, 5:05pm

[quote]After that it’s sound and videio decoding are the main uses, and signal processing.
[/quote]
I think that level is beyond what Azeem is doing at the compiler level, but just for the record I will pipe up with my usual complaint re the lack of optimization in existing native code of the JRE.

SSE must be used from JPEG decoding and encoding, to not do so is to throw performance out the window. It’s like purposely using a bubble sort when every other sort algorithm would result in a massive perfromance improvement.

SSE can also be used to get massive performance gains in software loops that do image scaling and blitting.

ajiva · November 7, 2004, 2:50pm

Yeah its definitly beyond just one person to get those kinds of changes into a JDK. I really can only make changes in the VM. Plus I was thinking more along the lines of a loop optimization that recognizes certain types of loops and emits instructions appropriately (in this case the SIMD instructions).

krausest · November 16, 2004, 4:51am

The first Mustang builds are available. Is something that you mentioned already contained in those builds?
http://j2se.dev.java.net/

ajiva · November 16, 2004, 12:57pm

That’s amazing, you found this out before I did! Anyway yeah that would be where all my changes went to. B12 has everything except for the Math.abs() stuff, that’s in B13. Enjoy

ChrisRijk · November 16, 2004, 2:51pm

Some quick benchmarking with b12 (compared to 5.0 final) on a Mandelbrot benchmark I have which uses Math.log:

5.0 client - 1219 ms
5.0 server - 906 ms
6.0 client - 1203 ms
6.0 server - 562 ms

So client is basically the same while server is about 60% faster (with this particular example).

ajiva · November 16, 2004, 2:56pm

Yeah, LN and LOG10 are the only math intrinsics that are not in Client but are in Server. I probably should go back and add them

ajiva · November 16, 2004, 3:25pm

Actually to be picky, its 38% faster 1 - 562/906 = .38

ajiva · November 16, 2004, 11:58pm

Ok I’ve got LN and LOG10 intrinsified in client as well, should be available in b14 or b15.

princec · November 17, 2004, 7:10am

Is escape analysis going to make it into 6.0? Sure would like to see an end to some of the crappy programming practises I have to employ

Cas

sebastianf · November 17, 2004, 7:51pm

[quote]I don’t suppose you’d care to champion Structs with me?

Cas
[/quote]
Hi Cas, you might be interested in visiting the Mustang Forums, they are also pushing for structs there maskerading under extprim or some such.

Seb

sebastianf · November 17, 2004, 7:56pm

[quote]The first Mustang builds are available. Is something that you mentioned already contained in those builds?
http://j2se.dev.java.net/
[/quote]
Here’s the changelog for build 12:

http://forums.java.net/jive/thread.jspa?threadID=58&tstart=105

Seb

shawnkendall · November 19, 2004, 12:34am

Well, Structs have made it into the Top 10 RFEs, and still no REPLY from Sun.

Structs crosspost:
http://www.java-gaming.org/cgi-bin/JGNetForums/YaBB.cgi?board=Offtopic;action=display;num=1055068121;start=30#36