C/C++, Python, C#, Java benchs. Results surprising

hello,

I found an interesting article on the net with benchmarks between all the most popular languages. Results are surprinsing. Especially JDK 1.4.2 results.

Here:

http://www.osnews.com/story.php?news_id=5602&page=1

Yes, that’s the same site which has been mentioned one topic earliear, named “for Jeff… microbenchmarking.”

BUT… since I love well named topics (and can’t stand badly named ones) I clearly prefer yours. :slight_smile:

From the article:
“What lessons can we take away from all of this? I was surprised to see the four .NET 2003 languages clustered so closely on many of the benchmark components, and I was astonished to see how well Java 1.4.2 did (discounting the trigonometry score). It would be foolish to offer blanket recommendations about which languages to use in which situations, but it seems clear that performance is no longer a compelling reason to choose C over Java (or perhaps even over Visual J#, Visual C#, or Visual Basic)–especially given the extreme advantages in readability, maintainability, and speed of development that those languages have over C. Even if C did still enjoy its traditional performance advantage, there are very few cases (I’m hard pressed to come up with a single example from my work) where performance should be the sole criterion when picking a programming language. I would even argue that that for very complex systems that are designed to be in use for many years, maintainability ought to trump all other considerations (but that’s an issue to take up in another article).”

  • emphasis mine
    As it says, excluding the Trig test, it finds Java and .NET C++ neck ‘n’ neck.
    Nice :slight_smile:

It’s still a couple of very limited cases (as most benchmarks are), but the article still helps in the general attitude towards java. I was happy with the results in the article too :slight_smile:

Erik

Why the hell is trig so slow in the benchmark?

[quote]Why the hell is trig so slow in the benchmark?
[/quote]
From an article in the other topic about the same benchmark:[quote]

No prizes, but that is the problem. When reducing ‘large’ arguments Intel use a 66bit version of pi which is inadequate to maintain the precision required by Java. They also restrict the domain of the trig functions to rather less than the full range of double. Java requires that Math.sin(1.0e300) produce a result accurate to within 1 lsb.

http://developer.java.sun.com/developer/bugParade/bugs/4345903.html
[/quote]

Thanks.

I wouldn’t class that as a bug though as high precision could be useful to some people. By default I believe it should use less precision but allowing an option for full precision for those who need it would be useful.

Precision is a big issue with Intel chips.
When writing PC games, the first few lines in my code always sets the FPU precision to ‘single’, as divides, trig, & sqrt calls are almost double the speed.

The dependance on doubles for some of the maths functions is irritating in Java. What is more common - Java Games, or high-precision Java mathematical & engineering applications? Which is the main focus for Java development at Sun?

  • Dom

It would be nice to have a FloatMath to go next to what is currently a DoubleMath, if only to avoid having to cast the results back to floats constantly…

I understand that float versions of the Math methods were originally omitted because the early compiler couldn’t distinguish between methods which differed only in float vs double arguments. Why this defect still hasn’t been fixed is another question.
As far as Intel processors are concerned, when using the normal FPU instructions there is little performance advantage in float over double. You only get such an advantage if using SSE instructions which the JVM has only recently been enhanced to use. I gather that most RISC processors do show a significant performance benefit when using float instead of double.

As general practice, developers should use double until they know they have a performance problem and they should check their calculations very carefully for numerical instability when converted to float. It is far easier to get wrong answers due to the lack of precision with float than with double.

[quote]As far as Intel processors are concerned, when using the normal FPU instructions there is little performance advantage in float over double
[/quote]
Intel FPU timings (PII+):

FDIV:
Double: 32 cycles
Float: 17 cycles

FSQRT:
Double: 57 cycles
Float: 28 cycles

FSIN/FCOS timings are dependant on the number you are using, but in general float code is near double the speed.

SSE instructions (PIII) only deal with float. You need SSE2 to get multiple double precision calcultions.

For 99% of games, I would reccomend using float precision at all times. In fact the PS2 doesn’t even have a double precision FPU!!

the only cases where errors were noticable were during complex collision & physics, but then again for games we only need perceptible quality - not mathematical rigour. The near-doubling of the maths performance is definately a win for games development.

[quote]but it seems clear that performance is no longer a compelling reason to choose C over Java (or perhaps even over Visual J#, Visual C#, or Visual Basic
[/quote]
C / C++ has one major advantage for performance critical code - Inline assembler. Sadly, performance & readibilty/maintainability/portability rarely go hand in hand.

  • Dom

On the other hand the timings for multiplication and addition/subtraction are nearly identical. It is common to arrange calculations to avoid division as far as possible. So unless your code consists entirely of the slower ops like division, then a more typical mix will show no great advantage for float.

The performance difference between addition/multiplication and division is partly because division is more complicated, but more importantly because Intel (and other processor designers) put less effort into optimising it.

You are absolutely right - with careful pipelining the adds/muls can be executed at nearly one instruction per clock cycle. However, the cases I am considering are inner loops for perspective transforms. These would typically involve a div or sqrt, and would account for 30-40% of CPU time. The whole loop per vertex including clip testing would need to be <100 cycles, and the difference of 18 cycles non-pairable you get with a divide really hurt.

As with all optimisations, it only matters in the parts of the code that are run the most, and so 90% of the maths doesnt matter. It’s the other 10% of cases (the inner loops) where lowering the precision can make a significant difference.

Other instructions we found ‘troublesome’ are the float to int conversions needed to get from our maths to screen coordinates, and the float comparison operations (very nasty in x86). This is where C++ unions (or inline assembler) can give you a nice speed up as well.

Personally I would love to see an ‘inline bytecode’ facility for Java so we could use instructions that were not available via the standard C-like language (bitwise roll, add/subtract with carry spring to mind), as well as being able to reduce the bytecode counts for certain inner loops (my sprite rendering routines would benefit immensly from this).

  • Dom

Division is so slow relative to multiplication that when accuracy requirements aren’t high it might be worth computing the reciprocal using a few terms of a series expansion and then multiplying by that.