Fast Square Root – Quake III
Computers and more specifically hardware are fast these days. You can buy a modern GPU and crunch a lot of computations in zero time. But it wasn’t always that way. Engineers and programmers before had to become quite creative and do a lot of mathematical and computational optimizations. Even in modern days CPUs still compute certain operations faster than others. For example, multiplication is faster than division. Square root on the other hand is times slower than squaring a number.
To be efficient we always strive to find a way to compute code faster. This sometimes comes at the cost of a bit of accuracy but what you will learn about game development is that accuracy is often sacrificed. You wouldn’t want to get less than a certain amount of frames per second.
Table of Contents
- The math of computers
- Logarithms
- Shifting Bits
- Quake III’s approach
- My C++ implementation
- Conclusion
The math of computers
Before explaining how to efficiently calculate square roots we need to examine how the computer does the math and why accuracy and speed are lost during some operations. Computers can only calculate in binary numbers. Binary is represented only by two values: 0 and 1.
Human counting
If we consider our normal way of counting we count from 0 to 9 normally and then when we get to 10 we shift and add a second number on the left. Our numbering system can also be represented by the following formula:
And this will go on with every number we add to the left we will increment the power of 10 that we count with.
Computer Counting
Computers count in a similar fashion but they have only two digits to work with which means they are shifting numbers more often.
This is how a number would be represented in a binary format. The smallest size of a number type in your computer is a byte which consists of 8 bits (1 bit is a positional 0 or 1 value). Usually, we would also need to represent signed numbers too so in the case of a minus number you would check the left most bit for the sign.
This is at least for how computers count integer numbers. Floating-point numbers are a bit different. They are following something that is called the IEEE 754 standard. It consists of 3 main parts: the sign, the exponent, and the mantissa.
The exponent value is actually the number -127.
On the other hand the mantissa is actually fractions meaning that the way it is represented is this:
The way you transfer this into a decimal number would be with the following formula:
There is also a connection between an integer and a float. This connection is described by the following formula deriving from how floats are stored in memory:
Note: Sometimes the computer represents floats that it calculates it may show something like this 1.37438953472e+11. All this is a simple way of saying take the number 1.37438953472 and multiply it by 100000000000 (10 to the 11th) bringing the number to the following value: 137 438 953 472
Calculator
You can take a look at this simple calculator representation of floats and integers in the same memory space. You can set some bit values and see what and how numbers are calculated. I also put dividers for the float’s sign|exponent|mantissa values as a guide.
Logarithms
I want to write a quick reminder about what logarithms are. Here is the main formula about what a logarightms is:
Its is also considered to be the reverse of an exponent.
I want to examine one topic in detail. That is that between 0 and 1 the log function with base 2 of 1 + x would return almost linear values. If we add some constant correction value we will get a decent approximation using just a linear function. We would have the formula of type:
Shifting Bits
As you may have noticed above in the simple calculator for bits I've added two shift operators that will move the bits to the left and right. One handy property that can be noticed is that shifting bits to the right actually divides the integer number by 2 and shifting it to the left multiplies it by 2.
This also works interestingly for floats as shifting the mantissa actually shifts the exponent number and the mantissa. Consider that we represent floats with the following formula as mentioned above:
Then shifting the bits (with the rounding error) will approximately give us the following equation:
Quake III's approach
Quake 3 solves the equation of the inverse square root which is 1 / sqrt(x). This is quite useful by itself and we can solve square root just by multiplying the inverse square to the original number. There are also quite a lot of functions that use the inverse square directly.
Here is the original source code taken from Quake 3's open sourced engine:
float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y; // evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 ); // what the fuck?
y = * ( float * ) &i;
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
return y;
}
Math behind the approach
Well, let's combine everything mentioned above and examine how Quake 3's source code manages to calculate square roots efficiently. We will start by first investigating how we can represent the log2 of a floating-point number more easily. This will all make sense later bear with me:
What this simplification gets us is a representation of the log of the float number which is defined by its integer bits representation. We also introduced the constant correction value k that we mentioned in the logarithm section.
What we actually wanto to target to find is how to calculate:
Here comes the real magic. Lets define our solutions as y. If we apply log to both sides of the equation we get the following:
Then we can replace the logarithms with the bit representations (I mark M form mantissa and E for exponent):
This could further be solved into:
And if we do the division with approximation on the right because we have an integer we can replace the solution above with:
Newton's Iteration
This will give us values that are quite off the correct inverse square root though it will be a decent guess. This is why we need a correction function. One such correction function is the newton iteration. Newton's iteration states that if we have a good approximation of a function we can achieve a better one just by iterating using the following formula:
We can have as many iterations of newton's method as we want but since we have an already decent representation of our x0 we will just do one iteration. This gives us the following equation where n is our initial number:
And this concludes the mathematical part of this article. The rest is just an implementation where we get the integer representation by casting the memory of the float to a memory of int and implementing these formulas.
My C++ implementation
I reimplemented this function more cleanly in C++ for this article:
#include <cmath>
#include <cstdint>
#include <iostream>
std::float_t QuickInverseSqrt(std::float_t number) {
// Make float and int variables occupying the same memory space
union {
std::float_t number_representation;
std::int32_t integer_representation;
};
number_representation = number;
const std::int32_t kCorrection = 0x5f3759df;
integer_representation = kCorrection - (integer_representation >> 1);
std::float_t result = number_representation *
(1.5f - (number * 0.5f) * (number_representation *
number_representation));
return result;
}
std::float_t QuickSqrt(std::float_t n) {
std::float_t y = QuickInverseSqrt(n);
return y * n;
}
int main() {
for (std::int32_t i = 0; i <= 25; i++) {
std::cout << "sqrt(" << i * i << ") = " << QuickSqrt(i * i)
<< std::endl;
}
}
Or if you prefer you can test it out on GodBolt:
Conclusion
This was a bit of a complex topic but in game development, such optimizations are crucial to keeping the game in the best frame count possible. Such optimizations save the phone's battery life and make the operations faster meaning there is more time to process game objects and better graphics.
Leave a comment