When a 64bit int is cast to 64bit float in C/C++ and doesn't have an exact match, will it always land on a non-fractional number?

Question

Welcome To Ask or Share your Answers For Others

When a 64bit int is cast to 64bit float in C/C++ and doesn't have an exact match, will it always land on a non-fractional number?

asked Jan 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

When a 64bit int is cast to 64bit float in C/C++ and doesn't have an exact match, will it always land on a non-fractional number?

When int64_t is cast to double and doesn't have an exact match, to my knowledge I get a sort of best-effort-nearest-value equivalent in double. For example, 9223372036854775000 in int64_t appears to end up as 9223372036854774784.0 in double:

#include <stdio.h>

int main(int argc, const char **argv) {
    printf("Corresponding double: %f
", (double)9223372036854775000LL);
    // Outputs: 9223372036854774784.000000
    return 0;
}

It appears to me as if an int64_t cast to a double always ends up on as a clean non-fractional number, even in this higher number range where double has really low precision. However, I just observed this from random attempts. Is this guaranteed to happen for any value of int64_t cast to a double?

And if I cast this non-fractional double back to int64_t, will I always get the exact corresponding 64bit int with the .0 chopped off? (Assuming it doesn't overflow during the conversion back.) Like here:

#include <inttypes.h>
#include <stdio.h>

int main(int argc, const char **argv) {
    printf("Corresponding double: %f
", (double)9223372036854775000LL);
    // Outputs: 9223372036854774784.000000
    printf("Corresponding int to corresponding double: %" PRId64 "
",
           (int64_t)((double)9223372036854775000LL));
    // Outputs: 9223372036854774784
    return 0;
}

Or can it be imprecise and get me the "wrong" int in some corner cases?

Intuitively and from my tests the answer to both points appears to be "yes", but if somebody with a good formal understanding of the floating point standards and the maths behind it could confirm this that would be really helpful to me. I would also be curious if any known more aggressive optimizations like gcc's -Ofast are known to break any of this.

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-01-24T02:51:14+0000

In general case yes, both should be true. The floating point base needs to be - if not 2, then at least integer and given that, an integer converted to nearest floating point value can never produce non-zero fractions - either the precision suffices or the lowest-order integer digits in the base of the floating type would be zeroed. For example in your case your system uses ISO/IEC/IEEE 60559 binary floating point numbers. When inspected in base 2, it can be seen that the trailing digits of the value are indeed zeroed:

>>> bin(9223372036854775000)
'0b111111111111111111111111111111111111111111111111111110011011000'
>>> bin(9223372036854774784)
'0b111111111111111111111111111111111111111111111111111110000000000'

The conversion of a double without fractions to an integer type, given that the value of the double falls within the range of the integer type should be exact...

Though you still might encounter a quality-of-implementation issue, or an outright bug - for example MSVC currently has a compiler bug where a round-trip conversion of unsigned 32-bit value with MSB set (or just double value between 231 and 232-1 converted to unsigned int) would "overflow" in the conversion and always result in exactly 231.

Categories

When a 64bit int is cast to 64bit float in C/C++ and doesn't have an exact match, will it always land on a non-fractional number?

When a 64bit int is cast to 64bit float in C/C++ and doesn't have an exact match, will it always land on a non-fractional number?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags