On magnitude and precision of floating-point values

Hi there,

because there was some discussion about the difference between scale (i.e. magnitude) and precision of floating-point values, I decided to begin with a small article trying to explain what that actually means and where the difficulties lie when trying to use floating-point values.

Our computers use the IEEE 754 format for representing floating-point values. This is a very clever representation with one goal:
Trying to represent very large (in magnitude) values, while at the same time trying to also represent reasonable differences between such values, that are relative to the magnitude of the values.

(as a sidenote: I will use the US notation for decimal values, meaning that the decimal separator will be the dot and the thousands separator is the comma)

With relative above I mean: If you have some value 1.0 you would likely want to also express 1.001.
And if you have a value as big as 100,000.0 you are likely to also want to represent 100,020.0 along with it and probably not 100,000,001.
So the bigger (in magnitude) the values become, the bigger must also be the differences you want to represent between them.
This makes sense.

To achieve this, IEEE 754 32-bit floating-point values use a decomposition of the available 32 bits into 1 bit for the sign, 8 bits for the exponent and the remaining 23 bits for the mantissa.
A numerical value represented in IEEE 754 is then simply computed as: sign * mantissa * base ^ exponent.
The mantissa holds the significant digits of our number.

The base by convention is set to 2, since we are always using binary values. It could also have been some other value, such as 10.
And in fact, throughout this article I will assume it to be 10 to make computations within this article easier to follow.
Changing the base does not alter the things represented here and how they affect us when trying to work with floating-point values.

Precision and Normalization

Precision is the number of digits you need to accurately represent a given number, irregardless of its magnitude.
And those digits are exactly the significant digits stored in the above mentioned 23 bits of the mantissa.
More concretely, in IEEE 754 precision is independent of the actual magnitude of a value (i.e. how absolutely big that value is), because we have the exponent which can be used to “slide the decimal point around” and normalize almost every value back so that its decimal point is just behind the first significand digit.
With 32-bit floating-point values we can even normalize values with a magnitude as large as a one with 38 zeroes behind it. :slight_smile:

So, using normalization we can represent the absolute values 0.1 and 10,000.0 and even 1,000,000,000.0. (remember we are using 10 as base here)
If you use the scientific notation of those numbers, the normalization process becomes apparent: 1.0E-1, 1.0E4 and 1.0E9.

Regarding precision, the following values are equivalent, and none of them can be represented as a 32-bit floating-point entity.
(as is usual with the notation of numbers, any leading zeroes as well as any trailing zeroes are omitted, so 001.00 means the same as 1.0)


You see that you can arbitrarily scale your number. This makes no difference to the representability of it, because the precision, you need, effectively stays the same.
If you want to know how much precision you need to represent a given number, you just search for the “left-most” non-zero digit and the “right-most” non-zero digit and look at the distance (in number of digits) between them.
This is your needed precision. The precision of IEEE 754 32-bit floating-point is about 6 decimal digits.
You can put the decimal point wherever you want when determining the precision. The absolute value (or the magnitude) of your value does not matter here - only the number of total digits you need to represent that number is important.

This section introduced in what “precision” means. The next section tries to explain the discretization of floating-point values.
We need to understand this in order to see why we cannot represent arbitrarily large (in magnitude) values while at the same time trying to also have small (in magnitude) differences between them.

Unit in the last place

There can be times when we need very big values, like 149,600,000,000.
Be it because we are implementing a solar system simulator where our world coordinate-system is relative to the sun and we want to express large distances from the sun to our earth.

With this big value IEEE 754 has no problems at all with representing it. After all, in decimal, it only needs 4 digits of precision. So, no problem.
In normalized scientific notation we would just write 1.496E11.
And so does IEEE 754.
Also, if we only consider precision, we can shift the decimal point around as desired and let it just be a 1.496.

Now, imagine you have some character you control standing on the earth’s surface. By coincidence its initial position in our world coordinate-system is at (1.496E11, 0.0, 0.0) or (1.496, 0.0, 0.0).

You press the forward button on your keyboard to make that character move one meter on the earth’s surface in world-space positive x-direction. What happens now?
In theory your character should be at x-coordinate 149,600,000,001 or 1.49600000001.
But in practice he won’t, because as you see now, you suddenly need 12 decimal digits of precision to represent that value, whose magnitude only changed a tiny bit.
12 decimal digits of precision is far too much for IEEE 754 32-bit and therefore no matter how hard you press the forward button on your keyboard, your character stays still.

You increase the velocity at which your character should move in unit time, in the hope that this would fix the problem.
Now, suddenly your character jumps to x-coordinate 149,600,010,000 or 1.4960001 and thus moving a whopping “10 kilometers” in one computation step.
But wait? Why on earth would he move 10 kilometers in one step?

That’s because you increased the velocity of your character so much that your calculation of the next character position after applying velocity reached the next representable floating-point value at that magnitude.

This is known as the “unit in the last place” or just “ulp” for short. An ulp gives the smallest change that you can add or subtract to a given floating-point value to make a difference to that value, because the result can be represented again as a IEEE 754 floating-point value.

In Java, there is a nice method to compute the ulp for a given floating-point value via java.lang.Math.ulp().

Now, onto noise-generated planets

If your solar system contains some planet whose surface you want to displace via some noise function, you are going to need some parameterization of the planet’s surface to be able to feed a noise function with the coordinates of the point on the surface for which you want to compute the noise value.

For spheres, this can be either some three-dimensional cartesian coordinate with the origin being the planet’s center, or some (at least) two-dimensional parameterization, like longitude and latitude or two angles.

With this parameterization you can supposedly generate a noise value of each and every point on the sphere’s surface and zoom onto the surface arbitrarily.
Let’s see whether this really is so:

…to be continued…

Nice article! Impatiently awaiting the next!

You should note that all “normal” FP values have +1 bit of precision (it’s implied), so 24 in singles. 2-24 is a very small relative value. (do the math)

The notion that everyone to defined relative to a single coordinate frame is one reason why people think singles aren’t enough. Have a character’s position relative to the sun is pointless. The character has no meaningful interaction with the sun. Besides, it’s absurdly complex. An idealized model has the earth rotating at a constant (very fast) angular velocity about it’s axis which in turn is moving around the sun. Crazy complex which addresses no problem. We’ve already had a thread about these kinds of problems.

@Longarmx: Thank you!