I created a benchmark to test the difference between mapped iteration and plain array iteration. You can download the test code here.
Basically, the idea is that you have a loop somewhere, you go through your mapped data and perform an action on each element. The loop is very simple and you don’t pass the mapped object to another method, everything happens in the loop code. Although this case is quite simple and doesn’t describe every scenario, it will also be very common.
The important point here is that you don’t care about what happens to the current view offset. The current .view could have been anything before entering the loop and it will be something (mostly) useless after it (for the code after the loop). So, if we assume this is true, the problem is that every time you set the current view in the loop, you’re basically changing a value in system memory. Whereas in the case of array iteration, you only change the value of a CPU register. This implies a performance overhead that can become quite big, depending on the complexity of the mapped data and the computation that’s happening. For the simplest case I’m testing (a mapped object holding a single integer), the performance difference is a bit over 3x in favor of array access.
I guess there are better ways to solve this, but one simple solution would be introducing a second way to set the current view, that would only be valid in the current method/scope (.localView or .scopeView?). The user would only need to know/care that .view is “sticky” and .localView is temporary. So, both of these methods would have identical results:
void testView(MappedFoo foo) {
for ( int i = 0; i < 100; i++ ) {
foo.view = i;
// do something with foo
}
}
void testLocal(MappedFoo foo) {
for ( int i = 0; i < 100; i++ ) {
foo.localView = i;
// do something with foo
}
}
except that after testView foo.view will be equal to 99, but after testLocal it will be whatever it was before. As for what happens under the hood, see methods testView2 and testLocal in the code I linked above. That is what the bytecode transformer should output. If you run the benchmark, you’ll see that testLocal has almost identical performance to testJava.
Implementation-wise, there are some complications (e.g. what happens in the method stack, we need a stack slot for each mapped object, for the local address variable), but I think it’s doable. Riven would know better of course. What do you think?
edit: I guess it’s obvious, but the local viewAddress in testView2 is needed so that you can mix/match .view and .localView in the same method.