Basic Java code optimisation

Title says it all.
In essence, good programming habits one could (and should) have in order to make one’s program faster and more stable. Or possibly, what you this is good programming habits in general.

I know there are resources for this kind of thing, but perhaps hearing it from the members of the forum might give a different light on things.

I barely know about this kind of thing, so the only two things that pop in mind immediately are bit-shifting and taking conditions out of loops whenever possible. I guess using the same constant (100, 100 instead of 101, 102 ; that kind a thing) values might help as well.

Any suggestions ?

Optimizations in my perspective:

  1. Make a well educated guess to determine if the module you are developing it going to be performance intensive, if you’re building a collision detection algorithm, particle engine, anything with many many tightly packed iterations you might need to design with optimization in mind. Otherwise don’t and strictly just consider proper code design.

  2. If you didn’t do (1) [and programmers are generally horrible at doing 1 properly] and you begin experiencing code performance issues, use a profiler to determine where your bottle-necks are, and optimize specifically your bottle-necks. What in particular it is that you do to optimize it depends on what the code is doing.

Thanks for the reply,
but that is pretty general, and my sort of question includes programming habits, so I guess that would include the approach to code design too. I was thinking of performance improvements more along the lines of the basic stuff I mentioned.

There’s probably two basic principles I follow when developing:

  1. KISS (Keep It Simple Stupid), i.e. avoid ‘clever’ or overly complex code, keep it nice and maintainable.

  2. http://www.c2.com/cgi/wiki?RulesOfOptimization

Well the two don’t always travel in the same direction. Usually code optimizations steer away from code design to reduce indirection, simplify the execution path etc.

That said, if you’re looking for common code design patterns, here is a good list I found of some of the more common ones, the only note I would make is that in that list you’ll see singletons and service locators. Singletons are incredibly controversial (as are service locators) but the service locator is considered the lesser of the two evils - avoid both of them if you can:
http://www.javacamp.org/designPattern/

I hate when people say to avoid Singletons. You know what makes them “bad”? The fact that people use them when they shouldn’t.

In my opinion, you should learn any and every possible aspect and technique that you can. Not only so you know what you have available to you, but so you also know when and how to use it. By saying “avoid Singletons” that’s basically saying that they are obsolete or show no usefulness, which isn’t true at all.

In my experience, I find that the people that suggest this the most are the people who just heard it elsewhere and are blindly passing it forward.

Anyway, there are several techniques that can be used to optimize code. A couple that I learned in my time with C and code optimizing:

instruction scheduling
loop unrolling
code motion
reduction in strength
reduction in code duplication
inlining

I’ll leave it to you to learn about them.

The extent to which optimization is helpful is proportional to the complexity of your algorithms. The complexity of a programs design is going to be proportional to the knowledge of the programmer. So if you do not know what qualifies as an optimization then you probably don’t need to optimize.

Look to see if your game is slow due to graphics or due to engine code. If graphics are the bottle neck, then optimizing Java source won’t help.

Learning bitwise operations and using powers of two for array indexing is a simple improvement.

If physics is a bottleneck, replacing arrays of Rectangles with primitive arrays with data interleaved. [x0, y0, w0, h0, x1, y1, w1, h1, x2, y2, w2, h2, …] Such changes may help because sequential memory access is faster than random memory access. Objects may create an extra level of indirection. If hundreds of objects get accessed tens of thousands of times, you will be access memory in an irregular pattern and get poor RAM cache performance. Making such a change improved my brute force algorithm’s speed by a factor of 16. (Sometimes brute force is faster if you know what you are doing. This code took a third of my loop time, so it helped in my case.)

Do not attempt optimization beyond the scope to which you understand how a computer works at a low level. Remember that Java compilers (at least HotSpot) are very good. What C users might consider a necessary optimization might be unnecessary or detrimental for Java performance.

Edit for Troncoso posting:

instruction scheduling - Possibly helpful, the compiler can do this usually.
loop unrolling - Not generally useful in Java or C because it makes instructions take more RAM. The compiler is also capable of doing so automatically if it would help.
inlining and reduction in code duplication - These are opposite goals. Java can inline most methods including non-final ones. Overzealous inlining is harmful for the same reason as loop unrolling.

By saying use singletons you’re saying you know for a fact no one is ever going to need to have a second instance of your class. Which is more untrue.

Singletons aren’t obsolete and they don’t show no usefulness, but they make re-factoring code incredibly difficult, they make isolating your code difficult and they hide their dependencies.

Finally, I said avoid using singletons\service locators, I didn’t say never use them. The chances of a newbie using a singleton properly (and I am sure people could argue all day if that is actually possible, an argument I won’t involve myself in) is very low.

Never say never – wait, singletons say never.

When you do optimize, 1st focus on optimising the algorithm rather than the code.

Edit for Troncoso posting:

instruction scheduling - Possibly helpful, the compiler can do this usually.
loop unrolling - Not generally useful in Java or C because it makes instructions take more RAM. The compiler is also capable of doing so automatically if it would help.
inlining and reduction in code duplication - These are opposite goals. Java can inline most methods including non-final ones. Overzealous inlining is harmful for the same reason as loop unrolling.
[/quote]
There is a time and a place for all of these techniques. I never said use them all at once. He wanted some actual optimization techniques, so I threw some out there. Honestly, I don’t really think optimization is a big deal in Java if you understand the language and understand programming logic.

Oh. Loop tiling is another one. I wasn’t a fan of it, but it’s something. Can reduce cache misses. Though, you’d have to understand how cache works to implement it properly.

Not really. In 2013 making an optimization whose Wikipedia phrase uses the phrase “is an optimization performed by compilers” is redundant at best or counterproductive at worst.

There is no place for inlining at a source code level when it would be equally convenient to put it in a method. An inlined C function is bad code smell. I think some people still put an inlined keyword in a method header knowing the compiler ignores it to satisfy bosses, but then there are also people that know better and try to work around the compiler anyway to force inlining.

Loop unrolling (but not loop tiling) is also something that should be avoided. A compiler might benefit from a re-rolling optimization if you had legacy code featuring loop unrolling.

For Java at least if you do not have a compiler that does optimization well, you can use another program as part of your build process to perform that optimization on source or byte code. (Like ProGuard.) Un-“optimized” code is more cross platform, more future proof, and more optimizable.

For many, the optimal code is that which is easiest to read and modify or debug as needed. Are you wanting to optimize the hours you have to spend wrangling with a piece of code?

I’ve been bitten by the functional programming bug. I think it helps with keeping code clean and clear and can lend itself to parallel processing when done correctly, which is a plus as things go more and more multi-core.

You will be surprised as to what is or isn’t handled by today’s compilers. Actually, there are a surprising number of common bad habits that are dealt with, such as a simple calculation inside a loop condition test. That’s not to say its okay to do this, though!

So, if it really matters, profile/verify. “Agile” programming recommends, though, not to add features until they are needed, and I think this would include attempts at optimizing that go against basic good form and clarity or common sense.

Martin Fowler is a good author on the topic optimization of readable style.

I just did a search on “functional programming java” and found the following article. Looks like it might be interesting.
http://www.ibm.com/developerworks/java/library/j-fp/index.html
There’s some references at the end that might be more readable.

Not really what? I said there is a time and place for these. Maybe the time isn’t 2013. Haha. Don’t look so much into it. If nothing else, I think that kind of stuff is fun to learn about.

If it’s not broken, don’t fix it. :point:

Always have to create new versions to get sales :stuck_out_tongue:

Indeed. If it ain’t broke, add more stuff to it and break it all over again.

I recently spent some time profiling a procedural FM synth program I’ve been working on, trying to find areas of improvement. This is what I came up with:

Instead of “/ 2” (integer division by 2), “>> 1” really does perform better. However, “* 2” and “<< 1” gave me identical performance times. Is this compiler dependent?

Instead of creating a new double array with each iteration of a crucial loop, clearing an existing array and reusing it was significantly faster. Probably saves on garbage collection, too.

Instead of using Math.sin(), using a lookup table of 1024 indexes into a single sine wave, combined with linear interpolation performed significantly faster. I was a bit surprised because with the sin() one can just plug in a double, but with a lookup table, for decent accuracy, you have to do two lookups and compute the linear interpolation–e.g., a lot more fussing. In spite of this, the lookup method would win.

Really stupid (and what caused me to have to look for sources of performance dropoff), I came up with a totally inefficient way to do error checking, in that the call had a String concatenation (to identify the location of the code) in the parameter area. It wasn’t obvious to me that this was getting executed, since it was buried within the line of code. (And, I hadn’t made this error before. I seem to need to make every possible error at LEAST once, usually more, before I learn better.) That turned out to be the biggest culprit. If there is any String activity needed at all, keep it out of the sections that require any degree of performance.

The nice thing was that this error forced me to have to profile, and I found the other stuff in the process.

Maybe these suggestions are obvious or basic for most of you. I’m admittedly a self-taught intermediate level Java programmer with a LOT to learn still.

Division is slow. (Though division by a constant may be slightly faster.) Multiplication is better. I do not know the exact difference between addition, subtraction, shifts, and bitwise operations (which are the fastest two operand operations) but it is fast enough that integer division is sometimes implement faster (in assembly code only) using a combination of multiplication and other instructions. Multiplication by a power of two is probably optimized right off the bat. Division by two cannot by optimized on the other hand because it is not exactly the same. -1 >> 1 == -1 is the exception, so the change may not be automatic because it is not exactly equivalent to division.

I am a little surprised the array thing was not optimized, even though I would do that to begin with and assume it didn’t. Table look up may be different on different platforms, but it can be a good idea when you do not need a perfect drop in replacement. The String advice is also good. Strings don’t belong in most functions unless they are within an if block that ends with an exception being thrown. How did you find the division thing using a profiler? ??? Was that just a coincidence that you noticed it?

Thanks for the review! Nice to get some confirmation. I am not totally up on profiling techniques–am wondering if anyone knows of good tutorials on the subject. I have used jconsole and jvisualvm, but don’t really feel like I am using them to their true potential.

I mostly did the measurements “in context”. My synth is set up to play a “note” by iterating through an “envelope”. Normally, the output is directed to a SourceDataLine (a blocking queue) which limits the execution speed. Instead, I set the synth to just overwrite the output without blocking, allowing it to run at full speed.

Since the inner loops execute 44100 times for every second of sound being processed, and the envelope was set to about 6 or 7 seconds, running a single note and timing the duration seemed like a reasonable test, covering over 200,000 iterations. I’d compare execution times by changing one element. For example, using a Math.sin() function in a single oscillator gave me a total running time of about 9 millis for those “6 or 7 seconds” of audio. Using the table lookup method with interpolation, the otherwise same code took about 300 micro seconds! (I checked: when used as designed, the two methods are acoustically identical to my ear, and I have a reasonably ear for this sort of thing. Plus people with actual acoustical engineering background have assured me that linear interpolation should be adequate for this.)

I was looking for my notes on the div, and discovered I did that test by putting the operation in a loop. I don’t think I had any /2’s in the synth code to use for testing. The test code I wrote is below, very simple loop. Just uncomment the line you want to test. Of course every time you run it, it will produce a different outcome, and if you do two or three in a row, the latter runs will execute more quickly. Still, I found that *2 and << 1 and >>1 were all in the same ballpark with / 2 being an order of magnitude slower (relatively) for that number of iterations. I don’t know exactly how valid this sort of test is.


		// assumes you already have these
		long startTime;
		long endTime;
		@SuppressWarnings("unused")
		double a;

		//
		startTime = System.nanoTime();
		for (int i = 0; i < 10000000; i++)
		{
		// uncomment one of the following...
//			a = i / 2;
//			a = i >> 1;
//			a = i * 2;
//			a = i << 1;
		
		}
		endTime = System.nanoTime();
		System.out.println("elapsed: " + (endTime - startTime));

I hate accumulating those yellow warnings in Eclipse–hence the annotation.
Using i in the operation being tested is an attempt to lessen the potential amount of caching.

My general experience is that micro-benchmarking is more valuable than profiling. Don’t tell anyone though. The problem with profiling is that you cannot get a very accurate picture of little things. Sampling misses things and full profiling creates stalls that would not normally exist. I once profiled code that involved generating random numbers and profiling did not tell me that the RNG was the problem, even though the change increased speed by 50%. Yours is an optimization story I would recommend people read.

If you are curious why the sound is no different, it is because sound can be interpreted as a superposition of sine waves. If you subtract your base frequency sine wave from your interpolated sine wave then you get a function with tiny blips. If you fill in the difference with small amplitude high frequency, you get the real (interpolated) function. If you do it with 8 points, the difference between the big low frequency amplitude and small high frequency amplitudes is like a whisper in a loud, crowded room and since they are all overtones of the original you won’t notice them by ear anyway. If you double the number of points the difference is even greater. If you have thousands of points, the difference is negligible. At that point the low frequency overtones are eliminated (the first thousand frequency multiples I think) and very very high frequency overtones are much much smaller because the difference is smaller. They end up being too high and too quiet to hear or even produce through speakers, so you hear the intended frequency only.

Edit: That assumes a sound system with an infinite sampling rate and infinite precision numbers. Or assuming your precalculated values have a frequency is divisible by the sampling rate. People can’t hear below 20 Hz or above 20000 Hz, so…