New VM performance improvements

Use the link Chris provided.

A note to the site admins: When I went back to edit my post I saw that the link is correctly written in the url tag, I left it lke that. Some springs might be loose when generating the page I guess.

Seb

[quote]I fear that Structs are a hot potato.

But that, in a nutshell, is why I think no-one has replied yet. Not even Azeem…

Cas :slight_smile:
[/quote]
I see no use for structs in an OO based langauge. Structs are data with no behavior. Where do you put the behavior? How do you control the state?

I think what you are really asking for is the ability to use the shape of an object to define structure with-in a buffer or mmap’ed area of memory. This does not require a change in Java to achieve.

I am currently activily arguing against ANY changes to the Java language specification BEFORE we fully understand the implications of the change. I’m not interested in a pro/anti generics descission as this is not the forum for it but… I will use it as an example where the JCP jumped the gun in dropping in a flawed version of a change to the language and now that we have it, it is practically impossible to go back. I would suggest that changing a language is not something that the average programmer has enough knowledge to offer an expert opinion on how to extend a language. I, myself, do not claim to have enough knowledge to offer an expert opinion on how to extend any language. That said, what I do know is that more language features = more choice = more complexity = more problems. The thing that is wonderful about Java is that it does offer a syntax that approaches Smalltalk in simplicity but offers syntax and concepts that make is less foreign to those that need functional structures in order to work.

KEEP JAVA SIMPLE!
8)

You probably don’t see a use for ints then either…

I wish I hadn’t called them Structs now, because they are so not like C-structs at all.

[quote]I see no use for structs in an OO based langauge. Structs are data with no behavior. Where do you put the behavior? How do you control the state?
[/quote]
A struct in C++ is just a class like any other with every member declared public. That’s not what I’m after, of course.

That’s dead right, but I am unable to tweak the RFE. What I want is a VM semantic. It can be implemented in pure bytecode by altering the system classloader to generate bytecode on the fly, or the VM can intrinsify it when it spots the markers, eg. extending the abstract class java.nio.Struct.

In particular I don’t want to have to use getXXX()/setXXX() to access fields in the struct if I don’t want to.

When you see the hairy code we have to write to do OpenGL you’ll understand. The worst bit is, even though it’s hairy, it’s still 50% of the speed of the C version :frowning:

Cas :slight_smile:

[quote]You probably don’t see a use for ints then either…

I wish I hadn’t called them Structs now, because they are so not like C-structs at all.
[/quote]
Well, unless I’ve misunderstood what I’ve seen… these structs to look very C like in that there is space for shape but now for methods. As for getters/setters, I guess you could opt for public instance variables. The problem that I see is that public runs counter to encapsulation. That said, we do sometimes need to violate encapsulation for performance

Well, when I think back on some of the bit twiddling that I needed to do, if was one instance were OO just didn’t seem to fit or at least I couldn’t seems to satisfy the demands of OO and performance. Needless to say that performance needed to win and it did and OO was tossed on that particular effort. When we tossed OO, we also opted for C and FORTRAN in favor of Smalltalk (the OO choice at the time). For some reason, I just can’t see myself choosing Java over C in the same situation and I am an OO bigot ;D

I’m an OO bigot too :slight_smile:

We don’t need member variables to be public, we need them to be accessible without get/set for the purposes of readability within the Struct’s methods itself.

Classic example: Vector3f.

Cas :slight_smile:

[quote]I don’t suppose you’d care to champion Structs with me?

Cas :slight_smile:
[/quote]
In a nutshell… not as they are currently defined. But what I would champion with you is the ability of NIO to paste objects into a buffer giving that buffer structure. This is a low level feature that does have some interesting possibilities.

That’s exactly what I want. A way to map a Java object onto a byte buffer. No more, no less.

Cas :slight_smile:

[quote]I’m an OO bigot too :slight_smile:

We don’t need member variables to be public, we need them to be accessible without get/set
[/quote]
humm, not public yet accessible… you don’t mind if I don’t point out the contradiction on that statement will you??? ooops, I already did :o

and oh, not being 'merican… I almost forgot, happy turkey day

HAAAAPPPYYY TURKEY DAY!!! :slight_smile:

Poor things. And they’re so ugly, too :frowning:

After you’ve been doing OO for a very long time you begin to realise really what OO is all about. The big wet-fish-in-the-face surprise is that the dot operator and public keyword are actually there for a reason! Who’d have thought it? Besides, I want my members private as usual, so I can still write methods that go “return Math.sqrt(x * x + y * y);”

Cas :slight_smile:

[quote]Poor things. And they’re so ugly, too :frowning:

After you’ve been doing OO for a very long time you begin to realise really what OO is all about. The big wet-fish-in-the-face surprise is that the dot operator
[/quote]
dots I could do without… well for method calls anyways. Hotspot should [bold]ALWAYS[bold] inline a getter which should most likely be private 99% of the time anyways. Ask, don’t tell

[quote]Hotspot should [bold]ALWAYS[bold] inline a getter…
[/quote]
And yet often it can’t.

According to some message that I just read on the Mac java-dev mailing list basic classes like ArrayList can’t have the set(), get() or add() methods inlined properly due to the way the bounds check works. (Something to do with the way the out of bounds exception is thrown.)

Man, ArrayList OWNS it’s iterator… you’d think bounds checking would be unnecessary in that situation. I guess it’s an either on or off check :expressionless:

[quote]I’m open for suggestions to other improvements that might help.
[/quote]
How about a new “sincos” method? That is, a method that returns both sine and cosine of the same angle. Such a method is useful in many situations (for example quaternion calculation). The alternative is to call Math.sin and Math.cos, but a single method would likely optimize the calculation somehow.

While searching for a better solution to achieve this, I found a piece of code that results in a very good approximation:

float tan = (float)Math.tan(Math.toRadians(rot * 0.5));
float tanSQ = tan * tan;
float tanRCP = 1.0f + tanSQ;
float sin = 2.0f * tan / tanRCP;
float cos = (1.0f - tanSQ) / tanRCP;

For reference, the above code (after VM optimizations) is ~30% faster than the equivalent implementation using Math.sin and Math.cos.

It isn’t something we can’t live without, but after seeing what was added to Math in 5.0, it would be nice to have in 6.0.

Except that as a whole the sincos doesn’t add anything to the libraries. Not to mention that x86 would be the only platform to gain from this (having the fsincos instrution). Its just not very practical and way too specialized to have a “sincos” method.

New improvements?
Of course thats easy.

XCHNG instruction for swapping without using a temporally variable.
This means XCHNG int int
or XCHNG float, float
I think XCHNG Object, Object would work too, but let it be introduced on primitives first.

GC impovements:
If I know I would call some method that would be a really object generation/throwing away intensive, it will be nice to give some hint to GC. Like GC.hint(SOR_EXPECTED, GC_FRAME, 1); //small object release
GC.hint(BIGGOR_EXPECTED, GC_FRAME, 1); //bigg object release
GC.hint(BIGGOR_EXPECTED, GC_NEXT_METHOD, CLEAN_INSIDE) //This means next method would create a lot of garbage and cleans before return would be much better than swapping it out.
And of course the favorite:
GC.clean(long) //Clean it now. If you want to do any cleaning, you’d have long nanosecond (int millisecond). Return imediately if not interested.

Releasing of memory.
It would be nice if JVM release some memory after 5 minutes when it had around 60 MB free, and unused for long time. Actually when you have allocated 800MB and swapped on 2GB swap it doesn’t matter too much, but I might like be able to run 3 of these programs, just for testing, and then it could (will?) create problems.

And a few questions:

Are you using MMX register as a scratchpad? I mean in a PS2 way. 4096 Bytes againts 64 Bytes is a big difference, but it’s handy sometimes.

How effective is arraybound test elimination? For example if you had
image[a] + image[a+1] + image [a+2];
image[a - 1] + image[a] + image [a+2];
How many times would it check for out of bounds access?

[quote]New improvements?
Of course thats easy.

XCHNG instruction for swapping without using a temporally variable.
This means XCHNG int int
or XCHNG float, float
I think XCHNG Object, Object would work too, but let it be introduced on primitives first.
[/quote]
How do you know it’s not already used?

Worthless. Telling the GC what to do will not improve performance.

I believe Java 5 can release memory back to the OS. Not sure though.

What about adding “clamp” methods to Math ?


int clamp( int lower_bound, int value, int upper_bound );

// equivalent to

int max( lower_bound, min( upper_bound, value ));

And the same for byte, short, long, float, double of course.

Azeem, that’s a poor reason for not implementing sincos. If sincos is an instruction supported on 90% of the world’s computers then you may as well add it for those that can gain performance from it. For the remaining machines, it won’t make any difference.

It may be a specialist function but when it comes to performance tuning we are exactly talking about tuning for specialists.

Likewise the clamp() function - maybe not relevant today, but one day maybe intrinsifiable.

And so on, right on to full matrix operations.

Cas :slight_smile:

Hehe, I didn’t know an fsincos instruction exists, it just seemed logical to me that a low-level sincos implementation could be optimized. Actually, the existence of such an instruction somewhat proves its usefulness.