I’ve created a little microbenchmark test the relative costs of a few different ways of getting access to a temporary object inside a short method, and I thought the results might be of interest to other people.
My test computes the cross product of two 3D vectors and returns the result in the first vector. This is a small but real-world-useful operation, and it requires some temporary space for the cross product. I coded up multiple versions that used different techniques to get the needed temporary space as follows:
[1] Local var. This method just used local double variables for its temporary space. This is the only version that does not use an object (a Vector3d) for its temporary space.
[2] New. Allocate a new Vector3d each time the method is called.
[3] ThreadLocal. Get a temporary object using a ThreadLocal object
[4] Field. Get temporary object stored in a private field.
[5] Field sync. Synchronized method which gets its temporary from a private field as in [4].
[6] TempStack. Get temporary object from a TempStack which is essentially a object pool where objects must be returned in the reverse-order that they were gotten. TempStack is obtained using a ThreadLocal
[7] TempStack param. Use a TempStack passed in an explicit extra parameter. Ugly in that it requires an extra parameter but can be relatively fast.
Method 4 is not thread-safe and methods 3, 4, and 5 cannot be used in recursive methods. Method 2 is the cleanest of the object based methods, but how does its performance compare to the other? Here are some timings from my 1.7GHz Pentium 4 machine:
[tr][td]Test +++++++++++ [/td]
[td]JVM 1.4.2 -client [/td]
[td]JVM 1.4.2 -server [/td][/tr]
[tr]td Local var[/td][td]0.076[/td][td]0.015[/td][/tr]
[tr]td New[/td][td]0.144[/td][td]0.120[/td][/tr]
[tr]td ThreadLocal[/td][td]0.100[/td][td]0.039[/td][/tr]
[tr]td Field[/td][td]0.048[/td][td]0.040[/td][/tr]
[tr]td Field sync[/td][td]0.205[/td][td]0.216[/td][/tr]
[tr]td TempStack[/td][td]0.127[/td][td]0.047[/td][/tr]
[tr]td TempStack param[/td][td]0.069[/td][td]0.016[/td][/tr]
Times are in microseconds per method call and you can get the complete source code here http://www.graphics.cornell.edu/~bjw/CrossProductTest.java
A few thing to note from the results
[] As others have noted, under 1.4.2 -server is much faster for floating point code than -client
[] The difference between (1) and (2) gives the approximate cost of allocating and garbage collecting the temporary Vector3d object. Allocation increases the cost of the cross product method by a factor between 2 (client) and 8 (server), so it is still a very significant cost in this case.
[] The synchronized method is the most expensive in all cases, so it is still best to avoid synchronization when possible.
[] We use a technique similar to (7) in performance-critical sections of our own code, and I would happily change the code to something cleaner like (2), if the cost was small. However the cleaner object-based techniques are still significantly slower.
[] Although I expected (1) to be the fastest, under -client it turns out to be actually slower than (4) and (7) for reasons I don’t understand.
[] The field method (4) seems to be relatively slow under -server, for reasons I also don’t understand.
Caveat: This is a microbenchmark and performance may be different in real applications. I think garbage collection is a wonderful thing, and I do not advocate abandoning it for object pools except when really necessary for performance reasons (preferably after profiling your code first). Comments and critiques are welcome.