Posits are pretty interesting
Sep. 28th, 2022 03:49 pmFollowing up on my previous two entries, let's talk about posits!
Now in my previous entries I didn't talk about posits, or unums more generally. There are two reasons for this. First, unums aren't exactly traditional floating point. Secondly, unums were more of a proposal than something that people were actually using; certainly they weren't a native data type on any machine.
But now posits have gotten a written standard and, moreover, people are making hardware implementations of them. And the format has some interesting features, so let's talk about them!
First off, how do posits generally work? Most floating point formats have a sign bit, a fixed number of exponent bits, and a fixed number of mantissa bits. Now, if you read my previous entries, you know things don't always break down as cleanly as that, particular as regards the sign bit -- and then IEEE 754 decimal goes and mixes the exponent and mantissa bits together in even more confusing ways. But most of them can be pretty much described in terms of a sign bit, exponent bits, and mantissa bits.
Posits have a sign bit, a variable number of regime bits, and then the remaining bits are split between exponent and mantissa; the number of exponent bits is fixed by the format (up to the number remaining), and then whatever's left over is mantissa. (Apparently in standard posits, the cap on exponent bits is always 2. Huh. That's not a lot.) The idea is that the exponent is made up of a combination of the regime and exponent bits, the overall exponent being 2kr+e, where k is the number of exponent bits, r is the value coming from the regime bits (the "regime value"), and e is the value coming from the exponent bits. So it still breaks down overall as sign/exponent/mantissa, but with the exponent having a variable number of bits and being weirdly encoded.
But, obviously, that sort of thing is not why I'm writing this, the weird details are. So what are the weird details?
First off, the sign bit. If the sign bit is 1, the number is negative... but the representation is not straightforward sign-magnitude! For a negative number, you take the 2's-complement of the entire bit string!
I mentioned this as a possibility used by some old formats in my earlier entry, but at the time, I wasn't sure if I was reading the documentation correctly -- a format wouldn't really do that, would it? Well, yes, it would! Posits do it!
Doing a format this way raises a question I failed to consider when I wrote about this earlier: If your format works this way, then what does a 1 followed by all 0s represent? Since its 2's-complement won't begin with a 0, this rule leaves no way to interpret it. For older formats that did this, I assume the answer is simply "this is an invalid value in this format". For posits, though, this is not an invalid value, but rather an exceptional one! It's called "NaR", "not a real", and it's basically the same as IEEE 754's NaN, although with ±∞ rolled in as well.
Anyway, that leaves the question of interpreting the positive values. First, the regime bits. Because it's variable length, the regime bits are actually encoded in unary. Obviously that's not the only way they could have handled this, but it's hard to imagine the extra complication from other ways would be worth it. The regime bits consists of either a number of 1's followed by a 0, or a number of 0's followed by a 1. (Or, 0's or 1's that just run to the end. That's legal too!) In the former case, it indicates a nonnegative regime value, which is one less than the number of 1's. In the latter case, it indicates a negative regime value, which is the opposite of the number of 0's.
However, there's a special case here too: If the 0's run to the end, meaning you have the lowest possible regime value (and so lowest possible exponent), then the number is 0. So note that 0 gets a bit pattern of 0. This is neat because it avoids introducing redundant representations of 0, and it avoids both ways they happen. Like, in IEEE 754, you've got ±0. That doesn't happen here because of the use of 2's-complement; 0 has a bit pattern of 0, but the 2's-complement of 0 is again 0, rather than something beginning with a 1.
(This is another thing I didn't think about when writing my previous entries. If you're going to use a create a format that takes the 2's-complement-the-whole-thing approach to negatives, then you'd better make sure that in it, 0 gets a bit pattern of 0, something not true in all formats.)
But also -- old formats typically had redundant representations of 0 because, on the lowest exponent, you could set the mantissa to anything. That doesn't happen in IEEE 754 because you get denormals instead. Posits don't have denormals. But, there are no redundant representations here, because if this condition occurs, there's no bits left to make up the mantissa! Hooray!
Anyway, after the regime bits are the exponent bits. The exponent bits are read as an unsigned integer, and there's a fixed number of them, or rather a maximum number of them, since maybe there isn't enough room for all of them. However, there's a design decision here I don't understand. I would think that if there's not enough room for all the exponent bits, it would be treated as if the missing bits were 0, i.e., the existing ones would be padded on the right. That way, in these extreme regimes, you'd lose some possible exponents, yes, but in an even manner. Instead, however, whatever's there is read as a number by itself, i.e., it's effectively padded on the left with zeroes instead; this makes the resulting loss of possible exponents rather more uneven. That doesn't make sense to me -- is it easier to implement in circuitry or something? I wouldn't think so. But that's how they decided to do it.
Finally, whatever's left is the mantissa. As mentioned above, there are no denormals, this is a hidden-bit format, so the mantissa should be read with an implicit "1." before it. Hm.
And that's posits! Kind of neat.
There's one thing I haven't mentioned -- one of the benefits to the 2's-complement-the-whole-thing approach is that it means that (ignoring NaR), posits compare the same as 2's-complement integers, which is convenient. IEEE 754 was made so that it compares the same as sign-magnitude integers, but nobody uses those anymore, so this isn't such a benefit (unless you're using an unsigned variant or just sticking to nonnegatives). I guess a 1's-complement-the-whole-thing could yield the same benefit with 1's-complement integers, which thankfully also aren't used anymore, although I guess comparing those is essentially the same as comparing 2's-complement integers (ignoring that equality is slightly different), so that's nice.
Anyway, this makes me realize that, in my previous analyses, this is an aspect I completely left out -- which of these formats compare the same as integer formats? (Unsigned, 2's-complement, 1's-complement, or sign-magnitude; remember, I'm looking at older formats designed back when these latter ones were still current!) And, in the case of that balanced ternary computer, do floats compare the same as integers there? :P
Well -- I'll get back to that another time!
-Harry
Now in my previous entries I didn't talk about posits, or unums more generally. There are two reasons for this. First, unums aren't exactly traditional floating point. Secondly, unums were more of a proposal than something that people were actually using; certainly they weren't a native data type on any machine.
But now posits have gotten a written standard and, moreover, people are making hardware implementations of them. And the format has some interesting features, so let's talk about them!
First off, how do posits generally work? Most floating point formats have a sign bit, a fixed number of exponent bits, and a fixed number of mantissa bits. Now, if you read my previous entries, you know things don't always break down as cleanly as that, particular as regards the sign bit -- and then IEEE 754 decimal goes and mixes the exponent and mantissa bits together in even more confusing ways. But most of them can be pretty much described in terms of a sign bit, exponent bits, and mantissa bits.
Posits have a sign bit, a variable number of regime bits, and then the remaining bits are split between exponent and mantissa; the number of exponent bits is fixed by the format (up to the number remaining), and then whatever's left over is mantissa. (Apparently in standard posits, the cap on exponent bits is always 2. Huh. That's not a lot.) The idea is that the exponent is made up of a combination of the regime and exponent bits, the overall exponent being 2kr+e, where k is the number of exponent bits, r is the value coming from the regime bits (the "regime value"), and e is the value coming from the exponent bits. So it still breaks down overall as sign/exponent/mantissa, but with the exponent having a variable number of bits and being weirdly encoded.
But, obviously, that sort of thing is not why I'm writing this, the weird details are. So what are the weird details?
First off, the sign bit. If the sign bit is 1, the number is negative... but the representation is not straightforward sign-magnitude! For a negative number, you take the 2's-complement of the entire bit string!
I mentioned this as a possibility used by some old formats in my earlier entry, but at the time, I wasn't sure if I was reading the documentation correctly -- a format wouldn't really do that, would it? Well, yes, it would! Posits do it!
Doing a format this way raises a question I failed to consider when I wrote about this earlier: If your format works this way, then what does a 1 followed by all 0s represent? Since its 2's-complement won't begin with a 0, this rule leaves no way to interpret it. For older formats that did this, I assume the answer is simply "this is an invalid value in this format". For posits, though, this is not an invalid value, but rather an exceptional one! It's called "NaR", "not a real", and it's basically the same as IEEE 754's NaN, although with ±∞ rolled in as well.
Anyway, that leaves the question of interpreting the positive values. First, the regime bits. Because it's variable length, the regime bits are actually encoded in unary. Obviously that's not the only way they could have handled this, but it's hard to imagine the extra complication from other ways would be worth it. The regime bits consists of either a number of 1's followed by a 0, or a number of 0's followed by a 1. (Or, 0's or 1's that just run to the end. That's legal too!) In the former case, it indicates a nonnegative regime value, which is one less than the number of 1's. In the latter case, it indicates a negative regime value, which is the opposite of the number of 0's.
However, there's a special case here too: If the 0's run to the end, meaning you have the lowest possible regime value (and so lowest possible exponent), then the number is 0. So note that 0 gets a bit pattern of 0. This is neat because it avoids introducing redundant representations of 0, and it avoids both ways they happen. Like, in IEEE 754, you've got ±0. That doesn't happen here because of the use of 2's-complement; 0 has a bit pattern of 0, but the 2's-complement of 0 is again 0, rather than something beginning with a 1.
(This is another thing I didn't think about when writing my previous entries. If you're going to use a create a format that takes the 2's-complement-the-whole-thing approach to negatives, then you'd better make sure that in it, 0 gets a bit pattern of 0, something not true in all formats.)
But also -- old formats typically had redundant representations of 0 because, on the lowest exponent, you could set the mantissa to anything. That doesn't happen in IEEE 754 because you get denormals instead. Posits don't have denormals. But, there are no redundant representations here, because if this condition occurs, there's no bits left to make up the mantissa! Hooray!
Anyway, after the regime bits are the exponent bits. The exponent bits are read as an unsigned integer, and there's a fixed number of them, or rather a maximum number of them, since maybe there isn't enough room for all of them. However, there's a design decision here I don't understand. I would think that if there's not enough room for all the exponent bits, it would be treated as if the missing bits were 0, i.e., the existing ones would be padded on the right. That way, in these extreme regimes, you'd lose some possible exponents, yes, but in an even manner. Instead, however, whatever's there is read as a number by itself, i.e., it's effectively padded on the left with zeroes instead; this makes the resulting loss of possible exponents rather more uneven. That doesn't make sense to me -- is it easier to implement in circuitry or something? I wouldn't think so. But that's how they decided to do it.
Finally, whatever's left is the mantissa. As mentioned above, there are no denormals, this is a hidden-bit format, so the mantissa should be read with an implicit "1." before it. Hm.
And that's posits! Kind of neat.
There's one thing I haven't mentioned -- one of the benefits to the 2's-complement-the-whole-thing approach is that it means that (ignoring NaR), posits compare the same as 2's-complement integers, which is convenient. IEEE 754 was made so that it compares the same as sign-magnitude integers, but nobody uses those anymore, so this isn't such a benefit (unless you're using an unsigned variant or just sticking to nonnegatives). I guess a 1's-complement-the-whole-thing could yield the same benefit with 1's-complement integers, which thankfully also aren't used anymore, although I guess comparing those is essentially the same as comparing 2's-complement integers (ignoring that equality is slightly different), so that's nice.
Anyway, this makes me realize that, in my previous analyses, this is an aspect I completely left out -- which of these formats compare the same as integer formats? (Unsigned, 2's-complement, 1's-complement, or sign-magnitude; remember, I'm looking at older formats designed back when these latter ones were still current!) And, in the case of that balanced ternary computer, do floats compare the same as integers there? :P
Well -- I'll get back to that another time!
-Harry