how to detect occasional stuttering

Riven · January 16, 2008, 10:48pm

I already mentioned the performance loss of 10% when converting to instance-methods.

Well, it’s yet another ‘issue’ of the VM… it’s darn unpredictable when it comes down to optimisations.

I simply had to use pooling, to get a realworld (3d) application to run ‘fast enough’, the overall framerate increase was like 350% when doing lots of geometry transformations on the CPU. Since then, I stick to pools, as everytime they beat the crap out of any other solution.

princec · January 17, 2008, 8:13am

Chances are though you’d want your Vec3f’s initialized to some value when you get them; the vm doesn’t necessarily have to zero the x,y,z members if they are initialized in a constructor. So if both your benchmarks did that it would be a more realistic comparison I think.

Cas

arm · January 17, 2008, 8:42am

Riven,

Micro-benchmarking is like quantum phisycs:
observation is disturbed by measurement itself.

Hotspot compiler is known for having strange behaviour in
some cases.

I have slightly modified your code to make it more readable:


		// New
		{
			long ts, te;
			ts = System.nanoTime();
			for (int k = 0; k < loops; k++)
				calcNew(a, b, c);
			te = System.nanoTime();
			tNew[i] = te - ts;
		}

		// Pool
		{
			long ts, te;
			ts = System.nanoTime();
			for (int k = 0; k < loops; k++)
				calcPool(a, b, c);
			te = System.nanoTime();
			tPool[i] = te - ts; 
		}

Results:

Java HotSpot(TM) Server VM (build 11.0-b09, mixed mode)

Typical tNew:  [b]64800135[/b]
Typical tPool: [b]3851607[/b]

So I changed execution order inside Bench.main loop as follows :


		// Pool
		{
			long ts, te;
			ts = System.nanoTime();
			for (int k = 0; k < loops; k++)
				calcPool(a, b, c);
			te = System.nanoTime();
			tPool[i] = te - ts; 
		}

		// New
		{
			long ts, te;
			ts = System.nanoTime();
			for (int k = 0; k < loops; k++)
				calcNew(a, b, c);
			te = System.nanoTime();
			tNew[i] = te - ts;
		}

and I’ve got

Java HotSpot(TM) Server VM (build 11.0-b09, mixed mode)

Typical tNew:  [b]4922693[/b]
Typical tPool: [b]64387792[/b]

It seems that hotspot compiler doesn’t optimize first part of for loop.

Angelo

Riven · January 17, 2008, 9:16am

The problem is, that this doesn’t only happen in microbenchmarks, but also in realworld apps.

It’s trial-and-error all over again! :-\

Maybe I’ll write a natively-compiled (DLL) version, without SIMD, to see what the referencepoint is… one nice thing of native code is that it’s not having such weird performance-characteristics, when seemingly irrelevant changes are made to the sourcecode.

… I’ll write a better benchmark when I get home.

:-X

cylab · January 17, 2008, 9:18am

Actually no. Skipping initialization is one of the things you explicitly want if you use object pooling. Other than that, writing initializing pools is non-generic and a lot of wasted work.

Linuxhippy · January 17, 2008, 11:05am

Hi again,

Well the first loop runs interpreted first and the is replaced by OSR compiled code. Thats why my code does warmup for both code-paths, so that I don’t see compilation side effects.

However I thought a bit more about the fact why in real-world apps allocation may not be that cheap as its in the benchmark here:

The benchmark is a best-case for a GC. Almost all objects die in the young generation, which is exactly what the GC was tuned for. Also keep in mind that there are no complex object graphs.
In real-world applications , if young generation is sized too small, a lot of stuff is propagated to the old generation. An gc in the old-gen is expensive anyway…

Well, I just can say hotspot is great. However I have to admit that some knowledge of how hotspot does its things can help to understand whats going on…

lg Clemens

Riven · January 17, 2008, 11:19am

Hm… I don’t think you understand how my timing-loop and determining the typical duration works… (or I misunderstood you.)

My two methods also get ‘warmed’ by the VM, and those slow executions are put in the tNew/tLoop, but after sorting, they end up at the end of the long[], which makes them irrelevant, as only the middle of the long[] is read for the ‘typical’ execution-time.

Further, the GC isn’t really the bottleneck (you will see a few GC-durations of 0.1ms or so, on a total duration of ~20ms)
=> so the allocation is what’s slowing everything down.

But heck, I’ll write a better benchmark when I get home tonight…

Mr_Light · January 17, 2008, 3:38pm

so you going to do work based on unknown stuff? construction is what you want to skip if your pooling; if you want to skip the assignment of 0/zero filling step use a constructor or pool like so:

class Vec3f {
public float x,y,z;

Vec3f(float x,float y,float z) {
this.x = x;
this.y = y;
this.z = z;
}
}

pool.newVec3f(float x,float y,float z);
(With the pool you know for sure the zero filling is not done and the Vec3f is only assign a value, but I think the jvm should note this too and filter it out)

But that step need to be in there and should be forced else it leads to impossible obscured bugs
unknown value + 5 = unknown value => unreliable / useless

I though immutable stuff was promoted for tiny objects. btw Also I though optimisations where the act of doing things faster without changing behaviour.

cylab · January 17, 2008, 4:57pm

No - I know that objects taken from a pool are uninitialized. Nothing problematic with that. If I need them to be initialized, I can do it right after getting them from the pool.

Too much work - I wouldn’t want to create typed pools except by using generics…

Usually I know what I am doing…

This won’t happen, since I either initialize pooled objects myself or only use them left hand in assignments…

Yeah… the application should do the same as before after an optimization - but changing behaviours of the implementation is inevitable.

Mr_Light · January 17, 2008, 7:45pm

I can only speak for myself but I avoid these sharp edges if I’m coding alone; in team environments you need so much red/white tape around these types of things that it will start coming out of your ass.

This boiles down to what behavor is expected/taken for granted and if ppl are going to depend on it; in short it’s dangerous. Esp in the light that one person designs, one codes and an other one optimised. Differendly put ‘do the same’ can mean different things to different ppl, some behaviour you don’t find significant might be something somone else is depending upon.

Perhaps you’ll never encounter a case / the above will never apply for you, but this forum is still read by ppl in a different situation. That being said, it would be tedious to supply a disclaimer with every single bit of advice being given here.

(this also puts “use what you can, when you can” in a very subtle perspective )

Riven · January 17, 2008, 8:40pm

I’d never use object-pooling in a project where others can use that code directly.

I have the advantage (??) to be the only Java programmer in the companies I work in.

No need for red/white tape for me, but I’m lucky.

Mr_Light · January 18, 2008, 12:42pm

now what will happen when you retire? :-X

Riven · January 18, 2008, 1:08pm

see: ‘advantage (??)’ :-\

Linuxhippy · January 19, 2008, 3:02pm

Is this really worth a flame?

I agree that pools have some advantages, and sometimes “new” has some advantages … well and thats it
Each technique has its use-cases and advantages or disadvantages.

My hope is that someday soon stack-allocation will be implemented in hotspot, but as some Sun engineers statet this means many changes to the hotspot-jvm on various places so … well … I still hope

lg Clemens