Once again! fast MappedObjects implementation

Riven · July 11, 2011, 5:54pm

Changes:

Added code that measures how long the transforming takes (please report it if transforming takes too long!)
Refactored the MappedObject.init(…) method out of the public API
Added basic support for ‘view-connected’ MappedObjects


   static void testMappedSet()
   {
      MappedVec2 vec2 = MappedVec2.malloc(3);
      MappedVec3 vec3 = MappedVec3.malloc(3);

      MappedSet2 set = MappedSet.create(vec2, vec3);

      assert (vec2.view == 0);
      assert (vec3.view == 0);

      set.view = 2;
      assert (vec2.view == 2);
      assert (vec3.view == 2);

      set.view = 0;
      assert (vec2.view == 0);
      assert (vec3.view == 0);
   }

Riven · July 11, 2011, 10:02pm

v0.10 is now in lwjgl-util (nightlies) 8)

theagentd · July 12, 2011, 10:06am

Version 0.10 broke something for me! The exact same code that worked with version 0.9 doesn’t work anymore because of the LWJGL library being loaded twice.

Exception in thread "main" java.lang.UnsatisfiedLinkError: Native Library C:\Users\Mokyu\lib\lwjgl-2.7.1\native\windows\lwjgl.dll already loaded in another classloader

My main() function:

public static void main(String[] args) {
    MappedObjectTransformer.register(MappedParticle.class);
    if (MappedObjectClassLoader.fork(ParticleTest7.class, args)) {
        return;
    }

    new ParticleTest7().gameloop();
}

Also, I found out that I was using the client VM for my previous tests. With the server VM I actually get 60 FPS with 1 million particles on a laptop! Insane!
EDIT: With MappedObject on the server VM, I get about 1.5x increase in raw particle performance (I fill the buffers each frame, but OpenGL isn’t involved at all) compared to puts.

Riven · July 12, 2011, 1:20pm

the class ParticleTest7 probably causes the native libraries of LWJGL to be loaded.

What if you move the main-method out of the class that does anything LWJGL related?

Spasi · July 12, 2011, 1:37pm

The next nightly will have the following improvements:

Additional documentation.
Support for bounds checking. Enabled with -Dorg.lwjgl.util.mapped.Checks=true.
Timing and activity debug output has to be enabled with system properties as well (org.lwjgl.util.mapped.PrintTiming and .PrintActivity). org.lwjgl.util.Debug needs to be true at the same time.

Spasi · July 12, 2011, 1:47pm

I just noticed that mapping a buffer always uses the base buffer address as the starting point for the mapped object. This doesn’t strictly follow the LWJGL model of always using the current .position() for whatever you’re trying to do. Do you mind if I change it to work that way?

Riven · July 12, 2011, 2:08pm

Sure.

Nice job on the javadoc.

Regarding the logging, IMHO
http://java-game-lib.svn.sourceforge.net/viewvc/java-game-lib/trunk/LWJGL/src/java/org/lwjgl/util/mapped/MappedObjectTransformer.java?revision=3572&view=markup
line 65 is way too important to hide by default.

Spasi · July 12, 2011, 2:22pm

OK, added .position() to mapping and reverted to System.err for the client warning.

theagentd · July 12, 2011, 3:29pm

public class ParticleTest7Launcher {

    public static void main(String[] args) {
        MappedObjectTransformer.register(MappedParticle.class);
        if (MappedObjectClassLoader.fork(ParticleTest7Launcher.class, args)) {
            return;
        }

        new ParticleTest7().gameloop();
    }
}

This doesn’t change anything, still the same error. I suppose the problem is that a library load is triggered during the transformation before the fork. (???)
What am I doing wrong?! T___T

I used a slightly hacky reflection-thingy to check what libraries are loaded at different points in the program. The crash happens on my first use of Display in the constructor of ParticleTest7. However, just before I start creating the Display, the LWJGL library does NOT seem to be loaded already! My breakpoints in the loadLibrary(String) function also seem to point to that when Display is used, it tries to load the LWJGL library TWICE. I’m completely confused… As I said, it works like a charm in v0.9…

Riven · July 12, 2011, 4:41pm

If nothing else, I might implement fork(…) in such a way, that it really spawns another JVM.

Riven · July 12, 2011, 5:06pm

I can reproduce it, which is good news

Riven · July 12, 2011, 8:08pm

Unfortunately it’s not easy to solve. Rolling back to v0.9 is not an option as javac generates bytecodes that not quite transform correctly, it seems. The lib was developed in Eclipse, so I never saw the odd bytecodes javac generated.
I can tell ASM to make the required stack frame calculations, but that triggers the traversal of classes (including org.lwjgl.Sys) which triggers the first time the lwjgl natives are loaded. In the new classloader Sys is eventually loaded again causing the error-message you saw.

Riven · July 12, 2011, 9:43pm

Changes (Spasi)

Optional bounds checking on view field.

Bugfix (Riven)

No more double loading libraries caused by the computation of stack frames resulting in spurious class initialization during transform.

theagentd · July 13, 2011, 10:00am

Works wonders! Thank you so much for the fix! Now to try out the MappedSet class… =D

R.D · July 13, 2011, 12:05pm

Puh, I really really wan’t to understand the whole mapped objects stuff, but I can’t get into my brain (maybe because exams)… Let’s say I have a bunch of entities and now I want to switch to mapped objects, how to I make this happening?

Anyway, really great stuff. Read through some stuff Spasi posted and looks like i was blinded bei OOP

gouessej · July 13, 2011, 3:15pm

Hi

Can I safely call the clear() method on an instance whose class is a subclass of MappedObject?

Spasi · July 13, 2011, 3:42pm

If you mean that the subclass has a .clear() defined, then it’s safe because MappedObject doesn’t have a .clear(). If you mean you need a .clear() method in MappedObject, that would not be very useful, it’s as simple as doing .view = 0.

Spasi · July 13, 2011, 4:48pm

I created a benchmark to test the difference between mapped iteration and plain array iteration. You can download the test code here.

Basically, the idea is that you have a loop somewhere, you go through your mapped data and perform an action on each element. The loop is very simple and you don’t pass the mapped object to another method, everything happens in the loop code. Although this case is quite simple and doesn’t describe every scenario, it will also be very common.

The important point here is that you don’t care about what happens to the current view offset. The current .view could have been anything before entering the loop and it will be something (mostly) useless after it (for the code after the loop). So, if we assume this is true, the problem is that every time you set the current view in the loop, you’re basically changing a value in system memory. Whereas in the case of array iteration, you only change the value of a CPU register. This implies a performance overhead that can become quite big, depending on the complexity of the mapped data and the computation that’s happening. For the simplest case I’m testing (a mapped object holding a single integer), the performance difference is a bit over 3x in favor of array access.

I guess there are better ways to solve this, but one simple solution would be introducing a second way to set the current view, that would only be valid in the current method/scope (.localView or .scopeView?). The user would only need to know/care that .view is “sticky” and .localView is temporary. So, both of these methods would have identical results:

void testView(MappedFoo foo) {
	for ( int i = 0; i < 100; i++ ) {
		foo.view = i;
		// do something with foo
	}
}

void testLocal(MappedFoo foo) {
	for ( int i = 0; i < 100; i++ ) {
		foo.localView = i;
		// do something with foo
	}
}

except that after testView foo.view will be equal to 99, but after testLocal it will be whatever it was before. As for what happens under the hood, see methods testView2 and testLocal in the code I linked above. That is what the bytecode transformer should output. If you run the benchmark, you’ll see that testLocal has almost identical performance to testJava.

Implementation-wise, there are some complications (e.g. what happens in the method stack, we need a stack slot for each mapped object, for the local address variable), but I think it’s doable. Riven would know better of course. What do you think?

edit: I guess it’s obvious, but the local viewAddress in testView2 is needed so that you can mix/match .view and .localView in the same method.

Riven · July 13, 2011, 5:01pm

Going to investigate the options.

Simply allocating a local-var-entry for every MappedObject instance is impossible due to flow-control: you can create 1000 instances in a loop and access them all in wildly random patterns.

A ‘simple’ option would be to create a fast-path, if there is no malloc/map/dup/slice or any field/array-access in a method. So simple that we can be sure how many MappedObject instances we have…

Spasi · July 13, 2011, 5:06pm

I’m pretty sure you only need 1 local if you create 1000 instances in a loop. Basically you need 1 for every MappedObject variable in the code. This:

for ( i = 0; i < 1000; i++ ) {
	MappedFoo foo = MappedFoo.malloc(...);
	// do something with foo
}

requires only 1 local. Whereas this:

for ( i = 0; i < 1000; i++ ) {
	MappedFoo foo1 = MappedFoo.malloc(...);
	...
	MappedFoo foo2 = foo1.slice(); // or .dup();
	// do something with foo1 and foo2
}

requires 2.