There’s a wide misconception that doubles are somehow ‘better’ than floats and this is simply not the case. They simply have more precision and you pay for that in terms of memory usage and (usually) execution time (some ops will have same execution time for both). It’s true that for some usages the extra precision is required, but this is likewise true about doubles. I find this somewhat strange because you don’t find people thinking that 64-bit integers are somehow ‘better’ than lower bit width integers and they will happly use the one which in most appropriate for the given usage.
Not at all, they are simply not exposed at the high level. It’s the compiler’s job to use these instructions (runtime in the case of a VM). I’d expect vectorized support to improve as runtimes like Mono have been working on this front. It’ll probably require a new compiler framework as Sun’s seems to be showing it’s age (I haven’t paid attention to what the next gen is up to.) Both vmkit and Shark/Zero are based on LLVM, so they have “promise” in this respect. (Although LLVM needs work on auto-vectorization as well.) If we had a VM that performed a reasonable amount of vectorization, there would be a huge speed difference between floats and doubles on SIMD hardware. But ignoring SIMDifing, scalar floating point operations are generally faster than double (and never slower) for all CPU architectures that I’m familiar with. Couple this with the ever increasing speed gap between main memory and the CPU, moving data becomes a more and more of an issue.
Not sure what doesn’t make sense. These are the measure of time required for the executional unit to complete the computation.
Your example appears to be memory bound. As a simple counter-example: the speed difference between float and double multiple is much narrower than divide, but if you take my last minimax sin example and simply change from float to double you’re likely to see a speed difference on 1.2-2.0x depending on your hardware (and assuming Sun VM).
The question of performance of float vs. doubles is really a hardware question (and the indicated thread was about hardware) and my orginal statement was to that effect. My follow-up was in reply to “IMHO double is probably faster or at least same speed (and for sure will become faster)” which I claim to be incorrect. Is it possible to have two otherwise identical routines where a double version will be faster then the float? Yes, but that will be in the case where stalls of the double version are being more hidden by some other stall (probably memory read or write) that float version. This will be a rare occurance and tightly coupled to an exact hardware configuration. This is more of a desktop issue for CPU with many functional units.
I’d be happy to be proven wrong with a counter-example or pointer to CPU specification!