Runtime & Compiler Flags

How do you configure your runtine and the compiler to get a maximum of performance ?

At first, I guess one should use the comiler flag g:none, to disable the generation of debugging info. Then one has to choose the runtime: AFAIK: sever should starts up slower (as the client), but tends be faster afterwards - right ?

Are there any other important flags (like Xmx,.) or configurations, that have a recongizable affect on the performance ?

The reason, why I am asking is that due to my experience the byte code comiler and even more important the JIT, do very few optimzations. In my test cases the compiler doesn’t inline properties often (especially when thaemethods override an abstract one) and auto boxing like the following aren’t optimized well:
(any ideas why?)

public class Test
{
   private final float[] elems = new float[16];
   
   public Test()
   {
         // fill elems with random values
         ..
    }
   
   public Float foo(int index)
  {
       return this.elems[index];// or even: return this.bar(index);
  }

   public float bar(int index)
  {
       return this.elems[index];
  }
 public static void main(String[] args)
{
    for(int loop..
          test();
}


 public static void test()
 {
       Test test = new Test();

       long time = System.nanos..       
       float sum;
       // test bar
      sum = 0.0f;
      for(int loop...
           for(int index...
                 sum += test..bar(index);
       print(time, sum);

       // test foo: (is 'much' slower than bar)
      sum = 0.0f ;
      for(int loop...
           for(int index...
                 sum += test.foo(index);
       time = System.nanos.. - time;
       print(time, sum);

 }

}

best regards

  • michael

If you are talking about the Sun JDK…

The Compiler does NPO optimization, regardless of what flags you set. Thats on purpose. Trying to early-optimzie justmakes the improtant optimization period-- the run-time optimization, harder on the VM.

The VM (JIT really is an outdated term as its far more then just a JIT these days) does extensive optimization. More in fact that any C or C++ compiler I know of. If you are using the client VM then don’t as it leave s a number of important optimizatiosn on the table. User sevrer and “warm-up” your code befoire engaging the user.

Thanks for answering Jeff,

as I reapeted the tests several times (see loop in the main method), I am aksing my how long such a “warm-up” phase can take? The amount is probably a ratio between the alogorithm execution peeriod and the number of loops, right ?

And for c/c++ comparison: my experiance is that Java code is almost as fast, if comparing pure array of value types and arithmetic oprations, unfortatunately, the vm doesn’t optimize more object oriented code that well. Of coruse, there is quite an effort on optimizing c++ code, since inlining and other flags (like inline depth for recursive methods,…) have to be set manually. Furthermore they are static and can’t change during runtime, which is a great potential of using a (java) vm. On the other hand, I believe that SUN’s JDK / JRE still not capable of ‘deferred evaluation’ in order to simulate the compile time functionality of c++ expression templates (see Just when you thought your little language was safe: ``Expression Templates’’ in Java) - maybe JET can, but I don’t have a license to test this.

best regard

It really depends on the JVM you are using when the methods are compiled but sun has a 1500 threshhold for client and a 10000 for server, but you can also set this values manually.
However really expensive optimizations are only done by the server-vm, like inlining of virtual-methods and so on…

I personally think its absolute nonsence to warm-up code by hand - its just a work-arround of some design-weaks and in my mind absolutly stupid.
A JIT-cache which would cache methods on disk (as JRockIT is able to do) would be much more elegant, furthermore profile-informations of previous runs could be cached on disk which would especially help short-running applications a lot!

lg Clemens

I can see why a naive JIT cache wouldn’t work for Hotspot (as the precise runtime compiled state actually changes over time) but it would be a nice thing to serialize the state of a JVM and just load it in one big file load operation :slight_smile:

Cas :slight_smile:

Well, not for all & everything - as far as I know a JIT cache could even hinder more advanced optimizations like inlining or virtual method optimizations but for today’s gui applications (swing) you have a lot for code and an almost flat profile - except some hotspots.
So what could be done would be to compile the flat-code with optimizing for code-size withought any problematic optimizations and re-optimize the hotspot parts.
As far as I know JRockIT has a experimental feature doing exactly this.

However I am not an jvm specialist and I cannot even think about giving sun/hotspot guys tipps howto do their work. Both jvms (server/client) are impressive work!

lg Clemens

PS: Does anybody know for which release 2-phase compilation is scheduled? (mustang or dolphin)

Okay so pardon me my periodic rant , but i need to explain to thsi newb his primary error.

Your primary error is this: You wrote a microbenchmark.

Hotspot is brillaint at optimizing real code. But microbenchmarks don’t execute like real code. Ergo you will not get meaningful results from microbenchmarks. The simpler compilers and optimizers in you C compielr may actually perform better because they ARENT tuned for real code in the same way hotspot is.

What micrbenchmarks do most often is turn up intersting corners in the work hotspot does. I havent analyzed your code in detail because, to be honest, I just don’t really have the “umph” to do that. In the scores of microbenchmarks Ive seen and analyzed n my time in the JDK eprformance tuning team, and then afterward in this community, the answer was almsot always that the benchmark was doing something that real code woudlnt that was biasing the benchmark.

IF you have a good understanding of the ratehr complex things the system is doing under the hood then certian very specific and carefully written microbenchmarks can produce useful results. The vast majority hpwever just serve to illustrate one part or another of how the system is designed to eat real code well and simplsitic benchmarks poorly.

I personally think the mircobenchmarking issue is often used just as a lame excuse for the JVM :wink:

A have tested code like the mentioned expression templates in ‘real world’ application, more precise I exchanged the vertex skinning code in my character animation system to use dereffered evaluation and the JVM definately cannot handle these. The result is a huge drop in FPS.

Furthermore it a serious problem with microbenchmarks not beeing meaningful. They should, at least if the test starts after executing the code before for a given time and every computed results gets used afterwards (e.g. printing the sum). Otherwise it is a pain in the XXX to write performance cirtical code that does not depend on complexity.

Best regards
-Michael

As far as I can see, your benchmark primarily shows how a design error on your part negatively affects performance (i.e. using autoboxing where you should not).

I don’t think Jeff suggests that the JVM can only optimize complex code well, he’s just rehashing the (IMHO still valid) general point that microbenchmarks often show meaningless results if not done correctly.

That’s my point: auto-(un-)boxing should never affect the performance. Since the wrapper classes are final and there are only getters [ xxxValue() ] they could be replaced with primitive types in naitve code and all other methods could be inlined. So IMHO this isn’t a design error, ths is a limitation of SUN’s JIT.

Ridiculous.

The whole point of auto-boxing is that primitive types can’t fit into the existing code. At least not without a LOTof special-case optimizations in the JVM that would be extremely difficult to deal with. You can’t call methods on primitive types for example… all the code that calls equals() … comparators would have to be magically re-written by the VM… it makes no sense.

You might as well say that the compiler should figure out when you shoudl have used a linked list instead of an array and magically change it behind the scenes for you. Sure some day compilers might be advanced enough to do that sort of thing… but they simply aren’t yet.

With current technology it is clearly a design error.

Sorry, I don’t see that magic:

Once an instance of a wrapper class is created, the values returned by its methods are constants because these classes are declared final (no overloading is possible). Furthermore a wrapper class created upon its primitive type is bound to this value, which is constant by definition (a value type cannot be changed, only copied). The evalution graph is as simple is it can be, just a 3-Node Chain:

PrimtiveType -> WrapperClass -> PrimitiveType

and the algorithm to remove nodes (here the frist two) has only to check whether the expression evaluated to its parent ist constant. More precise, in order to make sure an expression doesn’t change:

  1. step: test if the method depends on any variable
    1.1. if yes recursively start at 1. to evaluate these variables
    1.2. if no, make sure that both, the method and variable(s), are declared final an therefore cannot change
    1.2.1 if everything is final remove this expression (optimization)
    1.2.2. if not, heavy runtime analysis may needed to device if the expression can change or it might even not be possible in multi-threaded environments, but these cases should not occur using the basic wrapper classes

I hate autoboxing. Do you know what happens when you’d have accidentally this code

public Object …
{

return 0; // obviously it shouldn’t compile, and it should be null
}

I was lucky to be carefull and tested why it compiled at all.

Yes, I agree autoboxing can be scary.

Nothing in the code you posted was declared ‘final’, though I doubt that will matter much - maybe the server compiler will do something with it.

The thing is as is pointed out above, this is a very trivial example. In “real” code ‘foo’ is likely to be called from many other places, some of which might actually need a Float class, not a float primitive.

Also, you went to the trouble to explicitly tell the compiler that you wanted ‘foo’ to return a Float object, not a float primitive. If you simply use ‘bar’ everywhere, and eliminate ‘foo’ entirely then the autoboxing will happen when it is truely needed, that is when the return value of bar needs to be a Float it will be converted to one. By writing the code as you did, you told the compiler that the return value of ‘foo’ needs to be a Float so the value is boxed prematurely. I consider that simply poor code, not a poor compiler… though you could argue either way.

[quote=“swpalmer,post:15,topic:23968”]
I’m talking about the wrapper classes (and indirectly their methods) as I mentioned the ‘final’ thing, which is sufficient for desiered optimizing. But you are wrong anyway, since you didn’t recognized the final before the float array, which is declared in the Test class. :wink:

[quote=“swpalmer,post:15,topic:23968”]
Again, this kind of optimization can performed at every wrapper class method call in the code, by performing a partial evaluation. Therefore it should fit into every ‘real code’! The trick is that a wrapper could be seen as some kind of const pointer for a single primitive type: since it is impossible to change the pointer itself and the value at which it is pointing, it can be replaced it with the primitive value under all circumstances I can imagine.

As you did recognize, the code isn’t real code. I wrote it only to see whether wrapper class conversion is handled efficiently. And just using the primtive type versions as you recommand, doesn’t work with generics.

Perhaps for methods that would be candidates for inlining - the process of inlining would eliminate the temporary float. That much makes sense. But I’m not sure it is “easy” to do more than that… the code that wraps the primitive type is in a method that, at least sometimes, does need to autobox. Are you suggesting that the VM build some synthetic method that doesn’t autobox and use it automatically in cases where it is just going to unbox anyway? Seems it is reaching to corner cases that are less worth persuing than other optimizations.

[quote]And just using the primtive type versions as you recommand, doesn’t work with generics.
[/quote]
Good point. I wasn’t aware of that.

[quote=“swpalmer,post:17,topic:23968”]
I agree. Inlining is the best possible optimisation here and this is excalty the result of a (partial) evaluation form new Float(array[i]).floatValue() to array[i] using an algorithm like the one I mentioned above.

The whole thing I was wondering is that accoringly to my microbenchmark, there is no inlining performed. That’s why I asked for flags or s.th. like that, because I remember doing a similiar test with the Java 1.4 version in which hotspot first occured with much better optimization results.

Meanwhile, I read an white paper about Excelsior’s JET Technology and I can’t wait to try the demo version, because it seems this JVM already performs all these optimization using partial evaluation. In favor of the SUN JVMs, one has to emphasize that they use Ahead-Of-Time compilation and therefore can use much more heavyweight/complex optimization techniques.

[quote=“swpalmer,post:17,topic:23968”]
That may be right. Unfortunatly, I haven’t enough insight into the current JVM technology to confirm or negate that, but it makes sense to me. Moreover, different optimization techniques often conflict and one has to choose the one, which produces the best results over all.

[quote=“zero,post:18,topic:23968”]

Most likely either it wasnt sufficiently warmed up or you werent running server VM.

Microbenchmarks are most often misleading., Thats really the important take-away here. On real world apps we are seeing about
a 5% to 10% imporovemetn of performance in 1.5 or 1.4, and that again in 1.6 over 1.5

Then you dont know enough about either subject.

I’m sorry, Im really not trying to flame here, but that is a statement that illustrates fudnemental ignorance of the problem space.

sigh

Show me the exact code and I guess I will once AGAIN go throuygh the execercise of explainaing to you why you hurt yourself.

Unless ofcourse SWP has already adaquately explained it.

The answer is well known.

Write clear, clean well encapsulated REAL code. Profile. Tune based on the profile.

If you need more help I suggest you read the book Steve and I wrote years ago on this subject. Its available for free at http://java.sun.com/docs/books/performance.

“It is a poor workman who balmes his tools.” – Anonymous quote