Once again! fast MappedObjects implementation

Changes:

  • Added code that measures how long the transforming takes (please report it if transforming takes too long!)
  • Refactored the MappedObject.init(…) method out of the public API
  • Added basic support for ‘view-connected’ MappedObjects

   static void testMappedSet()
   {
      MappedVec2 vec2 = MappedVec2.malloc(3);
      MappedVec3 vec3 = MappedVec3.malloc(3);

      MappedSet2 set = MappedSet.create(vec2, vec3);

      assert (vec2.view == 0);
      assert (vec3.view == 0);

      set.view = 2;
      assert (vec2.view == 2);
      assert (vec3.view == 2);

      set.view = 0;
      assert (vec2.view == 0);
      assert (vec3.view == 0);
   }

v0.10 is now in lwjgl-util (nightlies) 8)

Version 0.10 broke something for me! The exact same code that worked with version 0.9 doesn’t work anymore because of the LWJGL library being loaded twice.

Exception in thread "main" java.lang.UnsatisfiedLinkError: Native Library C:\Users\Mokyu\lib\lwjgl-2.7.1\native\windows\lwjgl.dll already loaded in another classloader

My main() function:

public static void main(String[] args) {
    MappedObjectTransformer.register(MappedParticle.class);
    if (MappedObjectClassLoader.fork(ParticleTest7.class, args)) {
        return;
    }

    new ParticleTest7().gameloop();
}

Also, I found out that I was using the client VM for my previous tests. With the server VM I actually get 60 FPS with 1 million particles on a laptop! Insane!
EDIT: With MappedObject on the server VM, I get about 1.5x increase in raw particle performance (I fill the buffers each frame, but OpenGL isn’t involved at all) compared to puts.

the class ParticleTest7 probably causes the native libraries of LWJGL to be loaded.

What if you move the main-method out of the class that does anything LWJGL related?

The next nightly will have the following improvements:

  • Additional documentation.
  • Support for bounds checking. Enabled with -Dorg.lwjgl.util.mapped.Checks=true.
  • Timing and activity debug output has to be enabled with system properties as well (org.lwjgl.util.mapped.PrintTiming and .PrintActivity). org.lwjgl.util.Debug needs to be true at the same time.

I just noticed that mapping a buffer always uses the base buffer address as the starting point for the mapped object. This doesn’t strictly follow the LWJGL model of always using the current .position() for whatever you’re trying to do. Do you mind if I change it to work that way?

Sure.

Nice job on the javadoc.

Regarding the logging, IMHO
http://java-game-lib.svn.sourceforge.net/viewvc/java-game-lib/trunk/LWJGL/src/java/org/lwjgl/util/mapped/MappedObjectTransformer.java?revision=3572&view=markup
line 65 is way too important to hide by default.

OK, added .position() to mapping and reverted to System.err for the client warning.

public class ParticleTest7Launcher {

    public static void main(String[] args) {
        MappedObjectTransformer.register(MappedParticle.class);
        if (MappedObjectClassLoader.fork(ParticleTest7Launcher.class, args)) {
            return;
        }

        new ParticleTest7().gameloop();
    }
}

This doesn’t change anything, still the same error. I suppose the problem is that a library load is triggered during the transformation before the fork. (???)
What am I doing wrong?! T___T

I used a slightly hacky reflection-thingy to check what libraries are loaded at different points in the program. The crash happens on my first use of Display in the constructor of ParticleTest7. However, just before I start creating the Display, the LWJGL library does NOT seem to be loaded already! My breakpoints in the loadLibrary(String) function also seem to point to that when Display is used, it tries to load the LWJGL library TWICE. I’m completely confused… As I said, it works like a charm in v0.9…

If nothing else, I might implement fork(…) in such a way, that it really spawns another JVM.

I can reproduce it, which is good news :slight_smile:

Unfortunately it’s not easy to solve. Rolling back to v0.9 is not an option as javac generates bytecodes that not quite transform correctly, it seems. The lib was developed in Eclipse, so I never saw the odd bytecodes javac generated.
I can tell ASM to make the required stack frame calculations, but that triggers the traversal of classes (including org.lwjgl.Sys) which triggers the first time the lwjgl natives are loaded. In the new classloader Sys is eventually loaded again causing the error-message you saw.

Changes (Spasi)

  • Optional bounds checking on view field.

Bugfix (Riven)

  • No more double loading libraries caused by the computation of stack frames resulting in spurious class initialization during transform.

Works wonders! Thank you so much for the fix! Now to try out the MappedSet class… =D

Puh, I really really wan’t to understand the whole mapped objects stuff, but I can’t get into my brain (maybe because exams)… Let’s say I have a bunch of entities and now I want to switch to mapped objects, how to I make this happening?

Anyway, really great stuff. Read through some stuff Spasi posted and looks like i was blinded bei OOP :smiley:

Hi

Can I safely call the clear() method on an instance whose class is a subclass of MappedObject?

If you mean that the subclass has a .clear() defined, then it’s safe because MappedObject doesn’t have a .clear(). If you mean you need a .clear() method in MappedObject, that would not be very useful, it’s as simple as doing .view = 0.

I created a benchmark to test the difference between mapped iteration and plain array iteration. You can download the test code here.

Basically, the idea is that you have a loop somewhere, you go through your mapped data and perform an action on each element. The loop is very simple and you don’t pass the mapped object to another method, everything happens in the loop code. Although this case is quite simple and doesn’t describe every scenario, it will also be very common.

The important point here is that you don’t care about what happens to the current view offset. The current .view could have been anything before entering the loop and it will be something (mostly) useless after it (for the code after the loop). So, if we assume this is true, the problem is that every time you set the current view in the loop, you’re basically changing a value in system memory. Whereas in the case of array iteration, you only change the value of a CPU register. This implies a performance overhead that can become quite big, depending on the complexity of the mapped data and the computation that’s happening. For the simplest case I’m testing (a mapped object holding a single integer), the performance difference is a bit over 3x in favor of array access.

I guess there are better ways to solve this, but one simple solution would be introducing a second way to set the current view, that would only be valid in the current method/scope (.localView or .scopeView?). The user would only need to know/care that .view is “sticky” and .localView is temporary. So, both of these methods would have identical results:

void testView(MappedFoo foo) {
	for ( int i = 0; i < 100; i++ ) {
		foo.view = i;
		// do something with foo
	}
}

void testLocal(MappedFoo foo) {
	for ( int i = 0; i < 100; i++ ) {
		foo.localView = i;
		// do something with foo
	}
}

except that after testView foo.view will be equal to 99, but after testLocal it will be whatever it was before. As for what happens under the hood, see methods testView2 and testLocal in the code I linked above. That is what the bytecode transformer should output. If you run the benchmark, you’ll see that testLocal has almost identical performance to testJava.

Implementation-wise, there are some complications (e.g. what happens in the method stack, we need a stack slot for each mapped object, for the local address variable), but I think it’s doable. Riven would know better of course. What do you think?

edit: I guess it’s obvious, but the local viewAddress in testView2 is needed so that you can mix/match .view and .localView in the same method.

Going to investigate the options.

Simply allocating a local-var-entry for every MappedObject instance is impossible due to flow-control: you can create 1000 instances in a loop and access them all in wildly random patterns.

A ‘simple’ option would be to create a fast-path, if there is no malloc/map/dup/slice or any field/array-access in a method. So simple that we can be sure how many MappedObject instances we have…

I’m pretty sure you only need 1 local if you create 1000 instances in a loop. Basically you need 1 for every MappedObject variable in the code. This:

for ( i = 0; i < 1000; i++ ) {
	MappedFoo foo = MappedFoo.malloc(...);
	// do something with foo
}

requires only 1 local. Whereas this:

for ( i = 0; i < 1000; i++ ) {
	MappedFoo foo1 = MappedFoo.malloc(...);
	...
	MappedFoo foo2 = foo1.slice(); // or .dup();
	// do something with foo1 and foo2
}

requires 2.