final modifier

[quote]Urban performance legend #2: Declaring classes or methods final makes them faster

I discussed this myth in October’s column (see Resources), so I won’t rehash it in great detail here. Many articles have recommended making classes or methods final, because it makes it easier for the compiler to inline them and therefore should result in better performance. It’s a nice theory. Too bad it’s not true.

This myth is even more interesting than the synchronization myth, because there’s no data to support it – it just seems plausible (at least the synchronization myth has a flawed microbenchmark to support it). Someone must have decided that it must work this way, told the story with confidence, and once the story got started, it was spread far and wide.

The danger of this myth, just like the synchronization myth, is that it leads developers to compromise good object-oriented design principles for the sake of a nonexistent performance benefit. Whether to make a class final or not is a design decision that should be motivated by an analysis of what the class does, how it will be used and by whom, and whether you can envision ways in which the class might be extended. Making a class final because it is immutable is a good reason to do so; making a complex class final because it hasn’t been designed for extension is also a good reason. Making a class final because you read somewhere that it will run faster (even if it were true) is not.
[/quote]
Quoted from http://www.ibm.com/developerworks/java/library/j-jtp04223.html

[quote]Declaring methods or classes as final in the early stages of a project for performance reasons is a bad idea for several reasons. First, early stage design is the wrong time to think about cycle-counting performance optimizations, especially when such decisions can constrain your design the way using final can. Second, the performance benefit gained by declaring a method or class as final is usually zero. And declaring complicated, stateful classes as final discourages object-oriented design and leads to bloated, kitchen-sink classes because they cannot be easily refactored into smaller, more coherent classes.

Like many myths about Java performance, the erroneous belief that declaring classes or methods as final results in better performance is widely held but rarely examined. The argument goes that declaring a method or class as final means that the compiler can inline method calls more aggressively, because it knows that at run time this is definitely the version of the method that’s going to be called. But this is simply not true. Just because class X is compiled against final class Y doesn’t mean that the same version of class Y will be loaded at run time. So the compiler cannot inline such cross-class method calls safely, final or not. Only if a method is private can the compiler inline it freely, and in that case, the final keyword would be redundant.

On the other hand, the run-time environment and JIT compiler have more information about what classes are actually loaded, and can make much better optimization decisions than the compiler can. If the run-time environment knows that no classes are loaded that extend Y, then it can safely inline calls to methods of Y, regardless of whether Y is final (as long as it can invalidate such JIT-compiled code if a subclass of Y is later loaded). So the reality is that while final might be a useful hint to a dumb run-time optimizer that doesn’t perform any global dependency analysis, its use doesn’t actually enable very many compile-time optimizations, and is not needed by a smart JIT to perform run-time optimizations.
[/quote]
Quoted rom http://www.ibm.com/developerworks/java/library/j-jtp1029.html

This is not really useful - both methods are static and therefor the “jumped-on” method is exactly the same all the time anyway.
In hotspot it should make a really small difference for non-static functions (uncommon trap check which checks wether the optimization is still valid), for ugly JVMs as well as static compilers it will help more (e.g. J2ME handsets).

I usually use Proguard+Optimization where it counts, it does such things automatically without wasting developer resources or destroying code :slight_smile:

lg Clemens

SimonH - never, ever do benchmarking on single execution of the method. Run same code multiple times and start looking at results only after 10 seconds or more.

As far as final is concerned, there is one major difference for the statics. Javac will inline final statics, while non-final statics will have to be loaded from memory each time.

In below code, in static final case, javac is smart enough to optimize if/throw part completely. I have no idea why, but under server compiler, non-final static version is considerably faster, even with it’s load and extra check… On the other hand, on client compiler, static final is almost twice faster before warmup and same speed after some time. It seems that in both cases, final compilation is making static final case a lot worse - could be interesting to see why.

So, there is for sure difference, in pathological cases it might be noticeable difference, but I would not bet on which version is faster - it probably all depends on code alignment in particular method or something similarly obscure.



public class Test {

	final static int FLOOP = 1000000000;
	static int SLOOP = FLOOP;
	
	
	public static void main(String[] args) {
		for ( int i =0; i < 10; i++ ) {
			long start = System.currentTimeMillis();
			double d = testStatic();
			if ( d > 0 ) {
				System.out.println("Static " + (System.currentTimeMillis()-start));
			}
			start = System.currentTimeMillis();
			double f = testFinalStatic();
			if ( f > 0 ) {
				System.out.println("Static final " + (System.currentTimeMillis()-start));
			}
		}
	}
	
	
	public static double testStatic() {
		double val = 0;
		for ( int i =0; i < SLOOP; i++ ) {
			val += i;
			if ( SLOOP > SLOOP+SLOOP) {
				throw new IllegalStateException();
			}
		}
		return val;
	}
	
	public static double testFinalStatic() {
		double val = 0;
		for ( int i =0; i < FLOOP; i++ ) {
			val += i;
			if ( FLOOP > FLOOP + FLOOP) {
				throw new IllegalStateException();
			}
		}
		return val;
	}
	
}


and disassembly to show that at least in bytecode, static final case should be a lot faster:


public static double testStatic()
    {
        double val = 0.0D;
    //    0    0:dconst_0        
    //    1    1:dstore_0        
        for(int i = 0; i < SLOOP; i++)
    //*   2    2:iconst_0        
    //*   3    3:istore_2        
    //*   4    4:goto            36
        {
            val += i;
    //    5    7:dload_0         
    //    6    8:iload_2         
    //    7    9:i2d             
    //    8   10:dadd            
    //    9   11:dstore_0        
            if(SLOOP > SLOOP + SLOOP)
    //*  10   12:getstatic       #13  <Field int SLOOP>
    //*  11   15:getstatic       #13  <Field int SLOOP>
    //*  12   18:getstatic       #13  <Field int SLOOP>
    //*  13   21:iadd            
    //*  14   22:icmple          33
                throw new IllegalStateException();
    //   15   25:new             #72  <Class IllegalStateException>
    //   16   28:dup             
    //   17   29:invokespecial   #74  <Method void IllegalStateException()>
    //   18   32:athrow          
        }

    //   19   33:iinc            2  1
    //   20   36:iload_2         
    //   21   37:getstatic       #13  <Field int SLOOP>
    //   22   40:icmplt          7
        return val;
    //   23   43:dload_0         
    //   24   44:dreturn         
    }

    public static double testFinalStatic()
    {
        double val = 0.0D;
    //    0    0:dconst_0        
    //    1    1:dstore_0        
        for(int i = 0; i < 0x3b9aca00; i++)
    //*   2    2:iconst_0        
    //*   3    3:istore_2        
    //*   4    4:goto            15
            val += i;
    //    5    7:dload_0         
    //    6    8:iload_2         
    //    7    9:i2d             
    //    8   10:dadd            
    //    9   11:dstore_0        

    //   10   12:iinc            2  1
    //   11   15:iload_2         
    //   12   16:ldc1            #8   <Int 0x3b9aca00>
    //   13   18:icmplt          7
        return val;
    //   14   21:dload_0         
    //   15   22:dreturn         
    }