how to detect occasional stuttering

I already mentioned the performance loss of 10% when converting to instance-methods.

Well, it’s yet another ‘issue’ of the VM… it’s darn unpredictable when it comes down to optimisations.

I simply had to use pooling, to get a realworld (3d) application to run ‘fast enough’, the overall framerate increase was like 350% when doing lots of geometry transformations on the CPU. Since then, I stick to pools, as everytime they beat the crap out of any other solution.

Chances are though you’d want your Vec3f’s initialized to some value when you get them; the vm doesn’t necessarily have to zero the x,y,z members if they are initialized in a constructor. So if both your benchmarks did that it would be a more realistic comparison I think.

Cas :slight_smile:

Riven,

Micro-benchmarking is like quantum phisycs:
observation is disturbed by measurement itself.

Hotspot compiler is known for having strange behaviour in
some cases.

I have slightly modified your code to make it more readable:


		// New
		{
			long ts, te;
			ts = System.nanoTime();
			for (int k = 0; k < loops; k++)
				calcNew(a, b, c);
			te = System.nanoTime();
			tNew[i] = te - ts;
		}

		// Pool
		{
			long ts, te;
			ts = System.nanoTime();
			for (int k = 0; k < loops; k++)
				calcPool(a, b, c);
			te = System.nanoTime();
			tPool[i] = te - ts; 
		}

Results:

Java HotSpot(TM) Server VM (build 11.0-b09, mixed mode)

Typical tNew:  [b]64800135[/b]
Typical tPool: [b]3851607[/b]

So I changed execution order inside Bench.main loop as follows :


		// Pool
		{
			long ts, te;
			ts = System.nanoTime();
			for (int k = 0; k < loops; k++)
				calcPool(a, b, c);
			te = System.nanoTime();
			tPool[i] = te - ts; 
		}

		// New
		{
			long ts, te;
			ts = System.nanoTime();
			for (int k = 0; k < loops; k++)
				calcNew(a, b, c);
			te = System.nanoTime();
			tNew[i] = te - ts;
		}

and I’ve got

Java HotSpot(TM) Server VM (build 11.0-b09, mixed mode)

Typical tNew:  [b]4922693[/b]
Typical tPool: [b]64387792[/b]

It seems that hotspot compiler doesn’t optimize first part of for loop.

Angelo

The problem is, that this doesn’t only happen in microbenchmarks, but also in realworld apps.

It’s trial-and-error all over again! :-\

Maybe I’ll write a natively-compiled (DLL) version, without SIMD, to see what the referencepoint is… one nice thing of native code is that it’s not having such weird performance-characteristics, when seemingly irrelevant changes are made to the sourcecode.

… I’ll write a better benchmark when I get home.

:-X

Actually no. Skipping initialization is one of the things you explicitly want if you use object pooling. Other than that, writing initializing pools is non-generic and a lot of wasted work.

Hi again,

Well the first loop runs interpreted first and the is replaced by OSR compiled code. Thats why my code does warmup for both code-paths, so that I don’t see compilation side effects.

However I thought a bit more about the fact why in real-world apps allocation may not be that cheap as its in the benchmark here:

  • The benchmark is a best-case for a GC. Almost all objects die in the young generation, which is exactly what the GC was tuned for. Also keep in mind that there are no complex object graphs.
  • In real-world applications , if young generation is sized too small, a lot of stuff is propagated to the old generation. An gc in the old-gen is expensive anyway…

Well, I just can say hotspot is great. However I have to admit that some knowledge of how hotspot does its things can help to understand whats going on…

lg Clemens

Hm… I don’t think you understand how my timing-loop and determining the typical duration works… (or I misunderstood you.)

My two methods also get ‘warmed’ by the VM, and those slow executions are put in the tNew/tLoop, but after sorting, they end up at the end of the long[], which makes them irrelevant, as only the middle of the long[] is read for the ‘typical’ execution-time.

Further, the GC isn’t really the bottleneck (you will see a few GC-durations of 0.1ms or so, on a total duration of ~20ms)
=> so the allocation is what’s slowing everything down.

But heck, I’ll write a better benchmark when I get home tonight…

so you going to do work based on unknown stuff? construction is what you want to skip if your pooling; if you want to skip the assignment of 0/zero filling step use a constructor or pool like so:

class Vec3f {
public float x,y,z;

Vec3f(float x,float y,float z) {
this.x = x;
this.y = y;
this.z = z;
}
}

pool.newVec3f(float x,float y,float z);
(With the pool you know for sure the zero filling is not done and the Vec3f is only assign a value, but I think the jvm should note this too and filter it out)

But that step need to be in there and should be forced else it leads to impossible obscured bugs
unknown value + 5 = unknown value => unreliable / useless

I though immutable stuff was promoted for tiny objects. btw Also I though optimisations where the act of doing things faster without changing behaviour.

No - I know that objects taken from a pool are uninitialized. Nothing problematic with that. If I need them to be initialized, I can do it right after getting them from the pool.

Too much work - I wouldn’t want to create typed pools except by using generics…

Usually I know what I am doing… :stuck_out_tongue:

This won’t happen, since I either initialize pooled objects myself or only use them left hand in assignments…

Yeah… the application should do the same as before after an optimization - but changing behaviours of the implementation is inevitable.

I can only speak for myself but I avoid these sharp edges if I’m coding alone; in team environments you need so much red/white tape around these types of things that it will start coming out of your ass.

This boiles down to what behavor is expected/taken for granted and if ppl are going to depend on it; in short it’s dangerous. Esp in the light that one person designs, one codes and an other one optimised. Differendly put ‘do the same’ can mean different things to different ppl, some behaviour you don’t find significant might be something somone else is depending upon.

Perhaps you’ll never encounter a case / the above will never apply for you, but this forum is still read by ppl in a different situation. That being said, it would be tedious to supply a disclaimer with every single bit of advice being given here.

(this also puts “use what you can, when you can” in a very subtle perspective :wink: )

I’d never use object-pooling in a project where others can use that code directly.

I have the advantage (??) to be the only Java programmer in the companies I work in.

No need for red/white tape for me, but I’m lucky.

now what will happen when you retire? :-X

see: ‘advantage (??)’ :-\

Is this really worth a flame?

I agree that pools have some advantages, and sometimes “new” has some advantages … well and thats it :wink:
Each technique has its use-cases and advantages or disadvantages.

My hope is that someday soon stack-allocation will be implemented in hotspot, but as some Sun engineers statet this means many changes to the hotspot-jvm on various places so … well … I still hope :wink:

lg Clemens