While working on my SqueakJS VM, it became necessary to deconstruct floating point numbers into their mantissa and exponent parts, and assembling them again. Peeking into the C sources of the regular VM, I saw they use the frexp() and ldexp() functions found in the standard C math library.
Unfortunately, JavaScript does not provide these two functions. But surely there must have been someone who needed these before me, right? Sure enough, a Google search came up with a few implementations. However, an hour later I was convinced none of them actually are fully equivalent to the C functions. They were imprecise, that is, deconstructing a float using frexp() and reconstructing it with ldexp() did not result in the original value. But that is the basic use case: for all float values, if
even if the value is subnormal. None of the implementations (even the complex ones) really worked.
Unfortunately, JavaScript does not provide these two functions. But surely there must have been someone who needed these before me, right? Sure enough, a Google search came up with a few implementations. However, an hour later I was convinced none of them actually are fully equivalent to the C functions. They were imprecise, that is, deconstructing a float using frexp() and reconstructing it with ldexp() did not result in the original value. But that is the basic use case: for all float values, if
[mantissa, exponent] = frexp(value)
then
value = ldexp(mantissa, exponent)
I had to implement it myself, and here is my implementation (also as JSFiddle):
function frexp(value) {
My frexp() uses a DataView to extract the exponent bits of the IEEE-754 float representation. If those bits are 0 then it is a subnormal. In that case I normalize it by multiplying with 264, getting the bits again, and subtracting 64. After applying the bias, the exponent is ready, and used to get the mantissa by canceling out the exponent from the original value.
My ldexp() is pretty straight-forward, except it needs to be able to multiply by very large and very small numbers. The smallest positive float is 0.5-1073, and to get its mantissa we need to multiply with 21073. That is larger then the largest float 21023. By multiplying in steps we can deal with that. Three steps are needed for e.g. ldexp(5e-324, 1023+1074) which otherwise would result in Infinity.
So there you have it. Hope it's useful to someone.
Update: I wrote this in 2014. Since 2016 there are npm packages with (apparently) correct implementations that are a lot more optimized than mine. They also have hundreds of lines of code so are much less understandable, but for production use they are likely a better choice: math-float64-frexp and math-float64-ldexp.
Correction: The code I originally posted here for ldexp() still had a bug, it did not test for too small exponents. I fixed it above, and updated the JSFiddle, too. Also, Nicolas Cellier noticed other rounding and underflow problems, his suggestions for ldexp() are now used above.
if (value === 0) return [value, 0];
var data = new DataView(new ArrayBuffer(8));
data.setFloat64(0, value);
var bits = (data.getUint32(0) >>> 20) & 0x7FF;
if (bits === 0) { // denormal
data.setFloat64(0, value * Math.pow(2, 64)); // exp + 64
bits = ((data.getUint32(0) >>> 20) & 0x7FF) - 64;
}
var exponent = bits - 1022;
var mantissa = ldexp(value, -exponent);
return [mantissa, exponent];
}
function ldexp(mantissa, exponent) {
var steps = Math.min(3, Math.ceil(Math.abs(exponent) / 1023));
var result = mantissa;
for (var i = 0; i < steps; i++)
result *= Math.pow(2, Math.floor((exponent + i) / steps));
return result;
}
My ldexp() is pretty straight-forward, except it needs to be able to multiply by very large and very small numbers. The smallest positive float is 0.5-1073, and to get its mantissa we need to multiply with 21073. That is larger then the largest float 21023. By multiplying in steps we can deal with that. Three steps are needed for e.g. ldexp(5e-324, 1023+1074) which otherwise would result in Infinity.
So there you have it. Hope it's useful to someone.
Update: I wrote this in 2014. Since 2016 there are npm packages with (apparently) correct implementations that are a lot more optimized than mine. They also have hundreds of lines of code so are much less understandable, but for production use they are likely a better choice: math-float64-frexp and math-float64-ldexp.
Correction: The code I originally posted here for ldexp() still had a bug, it did not test for too small exponents. I fixed it above, and updated the JSFiddle, too. Also, Nicolas Cellier noticed other rounding and underflow problems, his suggestions for ldexp() are now used above.
Comments
I trie to Unterstand your Code complety.
Ohne question. Why you so you Substrate 1022 from the exponent
Exponent = Bits -1022
Thanks for your help.
Greetings Michael
for a moment I thought you had found a bug ... but I think it's correct after all:
The exponent is stored with a bias of 1023, but then there is another implicit -1 because the mantissa is stored to be in the range of 0.5...1, that's why I only subtract 1022 not 1023. This is needed so that mantissa * 2 ^ exponent equals the original value.
Thanks.
That Sounds correct. Your first answer confused me.
I found your Blog because i'm searching for a convertion of a float 32/64Bit to float16bit. And Vice verse.
Me again. Now i unterstand that Code completly.
For the german readers i found a very helpfull article about.
http://www.iti.fh-flensburg.de/lang/informatik/ieee-format.htm
See Squeak image side fallback implementation - the one from Kernel-nice.900.mcz -
http://lists.squeakfoundation.org/pipermail/packages/2015-February/007538.html