JumpTables

ajiva · October 4, 2005, 8:37pm

In case anyone is interested, JumpTables made it into Java6 B55 (should be available wednesday or thursday). If you have large switch statements this should help a bit. Give it shot and report any bugs. Thanks.

Mustang bits at: https://mustang.dev.java.net/

erikd · October 18, 2005, 10:13am

I always thought that if you order your switch cases, you already got jump tables since years already ???
What is my confusion here?

K.I.L.E.R · October 18, 2005, 10:42am

If you have switch statements so large that you’re required to optimise them than your code is broken and you should fix it.
Switch statements are jump tables.

All properly written Java code will not see a performance increase with this “feature” in Mustang.

mthornton · October 18, 2005, 11:22am

Large switch statements are quite a reasonable approach for some types of code. Specifically parsers where the code is frequently machine generated.

Riven · October 18, 2005, 11:46am

And how long were you learning Java again?

erikd · October 18, 2005, 1:08pm

You show me how to create a *performing" interpreted CPU emulator without switches.

Riven · October 18, 2005, 1:17pm

Hm… maybe with a HashMap<Integer, Runnable> ? Better to roll your own hashmap implementation then, to support <int, Runnable> (without autoboxing). You’d safe yourself from a massive jump-table, but the Runnables might slow you don’t a bit. I suspect it aint worth the effort for JEmu2

erikd · October 18, 2005, 1:41pm

This approach will lead to 2 performance problems: Bounds checking and indirect function calls through an interface.

I even experimented with an array of interface references so that I can do opcodes[opcode].exec(); but even this is significantly slower than a switch.

pepijnve · October 18, 2005, 3:31pm

I think you’re correct that jumptables have been there all along, at least at a bytecode level. There’s even an instruction dedicated to this (tableswitch). Maybe what Azeem is refering to is that the tableswitch bytecode instruction is now also JIT’ed to a native jumptable?

ajiva · October 18, 2005, 5:43pm

Correct, the bytecodes generated from a switch statement can either be a tableswitch or a lookupswitch (depending on size, sparseness,etc). The previous implementation would create a binary search tree to calculate which branch of the switch statement to take. The Jumptable implementation creates a table in the constant section of the generated code, and in most cases its just a load address of the base of the table, calculate the index and jump to the new destination. The larger the table, the faster the implementation.

erikd · October 18, 2005, 7:43pm

oh sweetness :-* ;D

erikd · October 18, 2005, 9:20pm

I just gave it a try (Java HotSpot™ Client VM (build 1.6.0-ea-b56, mixed mode):

On JEmu2, with Z80 based emulators I see a nice overal performance boost of almost 6%. Pretty impressive
With one MC68000 based emulator I see an overal performance gain of about 9-10% :D, but in another MC68000 emulator, I see a tiny performance degredation of about 2% :-.
However, the latter emulator spends much less time in large switches (more time is spent in video emulation compared to CPU emulation), so I guess this little loss has to do with something else. I’ll try to put my finger on the cause of this degradation (really hard to tell at this point).

But overall, I’m impressed!

EDIT: BTW, I measured against 1.4.2_03 client VM. I’ll measure against a 1.5 later.

blahblahblahh · October 18, 2005, 11:21pm

My memory (back since 1.0.x/1.1.x days) is that you could get something like up to 16 int before it switched to the less direct impl?

K.I.L.E.R · October 19, 2005, 4:54am

3rd year now.
What does that have to do with jump tables?
I believe they are completely unreasonable even when you want to parse large amounts of data.
Just because I can’t think of something better at this time doesn’t mean a better way doesn’t exist.

erikd · October 19, 2005, 7:17am

[quote]I believe they are completely unreasonable even when you want to parse large amounts of data.
Just because I can’t think of something better at this time doesn’t mean a better way doesn’t exist.
[/quote]
Maybe you should first think of a reason why jump tables are unreasonable before claiming that my code is broken.
As long as you can’t think of something, I believe you are wrong. Remember we’re talking about raw performance here, which is not always the same as pretty code.

[quote]My memory (back since 1.0.x/1.1.x days) is that you could get something like up to 16 int before it switched to the less direct impl?
[/quote]
I remember that when you have a properly ordered switch, you get a tableswitch (even if it’s quite large), otherwise you get a lookup switch.

ajiva · October 19, 2005, 3:28pm

Thanks! You can play with the flags to change how large the switch statement has to be before Jumptables are used. The flag is -XX:MinJumpTableSize=
The default is at 16, you can try smaller but I’ve found 16 to be about the right size.

Raghar · October 19, 2005, 8:05pm

With binary search. Then compilation into underlying machine code, or bytecode.

And one more thing.

Markov models. In fact PATRICIA trie would do wonders. ~_^

erikd · October 19, 2005, 10:30pm

No, I don’t have to search anything.

Look, an interpreting CPU emulator is very easy:

while (cyclesLeft > 0) {
int opcode = fetch();
execute(opcode);
}

fetch() just reads an opcode from memory.
Nothing to optimize here, for example:

bus.readByte(PC++);

As I see it, I have 2 options for implementing execute():
1)
switch(opcode) {
case OPCODE_0: opcodeNOP(); break;
case OPCODE_1: opcodeADD(registerA, registerB); break;
etc.
}

A pre populated array with function pointers to implementations of the different opcodes:

opcodes[opcode].exec();

I don’t have to search through the array at all because I know in advance where everything is, so here is nothing to optimize.

The 2nd option is the nice one, but the problem with 2) is bounds checking and indirect calls.
In my experience a switch is much faster.

erikd · October 19, 2005, 10:39pm

Thanks, I’ll play with that.

Just one more question:
Why is there even a lookup switch? Isn’t a JumpTable always faster?

abies · October 20, 2005, 2:00pm

a) Sparse values. Like


switch (i) {
 case 1:
 case 999:
 case 12121212:
 default:
}

In such cases, binary branches are probably the best solution.

b) Degenerated cases like 2-4 switch targets, speculative path on CPU can cover most of the paths, depending on architecture, while having a jump to address taken from memory location is bound to be 100% stopping any kind of lookahead/pipelining.