[quote]It would only help a limited set of code, and you’d have to write the code to a very narrow range
[/quote]
We’d appreciate a document describing which those narrow ranges are/will be.
PS: Thanks for your efforts and for keeping us informed.
[quote]It would only help a limited set of code, and you’d have to write the code to a very narrow range
[/quote]
We’d appreciate a document describing which those narrow ranges are/will be.
PS: Thanks for your efforts and for keeping us informed.
Well from what I understand SIMD only works on sets of similar data. I’m not an expert at all, but from what I understand SIMD lets you do the same operation on different data (Single Instruction Multiple Data), that much I know from my computer science classes a long time ago. But I’m unsure how Intel implemented this in SSE/SSE2. I’m still researching this Again, if anyone has good documentation on SSE or a recommendation for a book I’m listening!
How about this for a start:
http://www.cortstratton.org/articles/OptimizingForSSE.php
https://shale.intel.com/SoftwareCollege/CourseDetails.asp?courseID=23
I’m sure if you ask Intel they will be forthcoming. THey do some very nice docs as well as training courses several times a year.
From my previous experience with it, the data types must be the same for all bits, and unless you pre-format the data into something suitable, you end up using as many instructions to shuffle/packthe data into the form you need in SSE than it would save
For example, a matrix multiply operation is not much faster in SSE because you have to transpose a matrix which takes quite a few cycles to do.
Specifically, it’s matrix4 and vector4 operations that need SIMD acceleration mostly as the highest priority, so it might be rather useful if these made it into the Java language as primitives and then got intrinsified.
After that it’s sound and videio decoding are the main uses, and signal processing.
Cas
You don’t even need to go that far. If we could get a version of the javax.vecmath package that used native code for everything, that would make a huge performance difference to a lot of applications.
[quote]If we could get a version of the javax.vecmath package that used native code for everything, that would make a huge performance difference to a lot of applications.
[/quote]
That would indeed be a quick’n’dirty solution (and would help lots of apps), but no, I’d hate that. It’s a bad API and a bad implementation. The effort should be better spent on a more generic solution.
I’m not sure about primitive types either. I’d prefer it if the VM could analyze the code and optimize on any possible opportunity (even non-vecmath code). I do realize though how difficult that is…
It’s a heck of a lot less bad than generics, and for an awful lot of applications more valuable (especially if they were to do a little bit of updating and unify vecmath a little with J2D’s Point classes etc).
No disagreement on the implementation - but there are better impls available already for free.
/me ducks and runs for cover
[quote]After that it’s sound and videio decoding are the main uses, and signal processing.
[/quote]
I think that level is beyond what Azeem is doing at the compiler level, but just for the record I will pipe up with my usual complaint re the lack of optimization in existing native code of the JRE.
SSE must be used from JPEG decoding and encoding, to not do so is to throw performance out the window. It’s like purposely using a bubble sort when every other sort algorithm would result in a massive perfromance improvement.
SSE can also be used to get massive performance gains in software loops that do image scaling and blitting.
Yeah its definitly beyond just one person to get those kinds of changes into a JDK. I really can only make changes in the VM. Plus I was thinking more along the lines of a loop optimization that recognizes certain types of loops and emits instructions appropriately (in this case the SIMD instructions).
The first Mustang builds are available. Is something that you mentioned already contained in those builds?
http://j2se.dev.java.net/
That’s amazing, you found this out before I did! Anyway yeah that would be where all my changes went to. B12 has everything except for the Math.abs() stuff, that’s in B13. Enjoy
Some quick benchmarking with b12 (compared to 5.0 final) on a Mandelbrot benchmark I have which uses Math.log:
5.0 client - 1219 ms
5.0 server - 906 ms
6.0 client - 1203 ms
6.0 server - 562 ms
So client is basically the same while server is about 60% faster (with this particular example).
Yeah, LN and LOG10 are the only math intrinsics that are not in Client but are in Server. I probably should go back and add them
Actually to be picky, its 38% faster 1 - 562/906 = .38
Ok I’ve got LN and LOG10 intrinsified in client as well, should be available in b14 or b15.
Is escape analysis going to make it into 6.0? Sure would like to see an end to some of the crappy programming practises I have to employ
Cas
[quote]I don’t suppose you’d care to champion Structs with me?
Cas
[/quote]
Hi Cas, you might be interested in visiting the Mustang Forums, they are also pushing for structs there maskerading under extprim or some such.
Seb
[quote]The first Mustang builds are available. Is something that you mentioned already contained in those builds?
http://j2se.dev.java.net/
[/quote]
Here’s the changelog for build 12:
http://forums.java.net/jive/thread.jspa?threadID=58&tstart=105
Seb
Well, Structs have made it into the Top 10 RFEs, and still no REPLY from Sun.
Structs crosspost:
http://www.java-gaming.org/cgi-bin/JGNetForums/YaBB.cgi?board=Offtopic;action=display;num=1055068121;start=30#36