在布尔中设置额外的位可使其同时为真和为假 [英] Setting extra bits in a bool makes it true and false at the same time

查看：112 发布时间：2020/9/22 3:41:14 c++ boolean undefined-behavior evaluation abi

本文介绍了在布尔中设置额外的位可使其同时为真和为假的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如果我得到一个bool变量并将其第二位设置为1，则该变量同时计算为true和false.使用带有-g选项(gcc-v6.3.0/Linux/RHEL6.0-2016-x86_64/bin/g++ -g main.cpp -o mytest_d)的gcc6.3编译以下代码，然后运行可执行文件.您得到以下内容.

T如何同时等于true和false?

       value   bits 
       -----   ---- 
    T:   1     0001
after bit change
    T:   3     0011
T is true
T is false

当您使用其他语言(例如，fortran)调用函数时，会发生这种情况，其中对和错的定义与C ++不同.对于fortran，如果任何位都不为0，则该值为true；如果所有位为零，则该值为false.

#include <iostream>
#include <bitset>

using namespace std;

void set_bits_to_1(void* val){
  char *x = static_cast<char *>(val);

  for (int i = 0; i<2; i++ ){
    *x |= (1UL << i);
  }
}

int main(int argc,char *argv[])
{

  bool T = 3;

  cout <<"       value   bits " <<endl;
  cout <<"       -----   ---- " <<endl;
  cout <<"    T:   "<< T <<"     "<< bitset<4>(T)<<endl;

  set_bits_to_1(&T);


  bitset<4> bit_T = bitset<4>(T);
  cout <<"after bit change"<<endl;
  cout <<"    T:   "<< T <<"     "<< bit_T<<endl;

  if (T ){
    cout <<"T is true" <<endl;
  }

  if ( T == false){
    cout <<"T is false" <<endl;
  }


}

//////////////////////////////////// //使用ifort编译时，Fortran函数与C ++不兼容.

       logical*1 function return_true()
         implicit none

         return_true = 1;

       end function return_true

解决方案

在C ++中，bool的位表示形式(甚至大小)是实现定义的；通常将其实现为char大小的类型，并使用1或0作为可能的值.

如果将其值设置为与允许值不同的任何值(在此特定情况下，通过将bool别名为char并修改其位表示形式)，则将破坏该语言的规则，因此任何事情都可以发生.特别地，在标准中明确规定破损". bool可能同时表现为true和false(或都不是true或false):

以本国际标准描述为未定义"的方式使用bool值，例如通过检查未初始化的自动对象的值，可能会导致它的行为就好像它既不是true也不是false

(C ++ 11，[基本知识]，注释47)

在这种特殊情况下，您会看到它如何在这种奇怪的情况下结束:首先if被编译为

    movzx   eax, BYTE PTR [rbp-33]
    test    al, al
    je      .L22

会将T加载到eax中(扩展名为零)，如果全部为零，则跳过打印；下一个如果不是

    movzx   eax, BYTE PTR [rbp-33]
    xor     eax, 1
    test    al, al
    je      .L23

测试if(T == false)转换为if(T^1)，仅翻转低位.对于有效的bool，这是可以的，但对于您的破损"用户而言，这是可以的.一种它不会削减它.

请注意，这种奇怪的序列仅在低优化级别下生成；在更高级别上，这通常会归结为零/非零检查，像您这样的序列很可能会变成成为

foo(bool, int):
        movzx   edi, dil
        lea     eax, [rdi+rsi]
        ret

其中，dil是受信任的".为0/1.

如果您的程序全部是C ++，则解决方案很简单:不要以这种方式破坏bool值，避免弄乱它们的位表示，一切都会顺利进行；特别是，即使您将整数分配给bool，编译器也会发出必要的代码以确保结果值是有效的bool，因此您的bool T = 3确实是安全的，并且T将最后以true开头.

如果相反，您需要与用其他语言编写的代码互操作，这些代码可能对bool的含义不尽相同，则只需避免使用bool来表示边界".代码，并将其编组为适当大小的整数.它将在有条件的&公司一样好.

有关问题的Fortran/互操作性方面的更新

免责声明我所了解的Fortran就是我今天早上在标准文档上阅读的内容，并且我有一些带有Fortran清单的打孔卡，可以用作书签，所以请放轻松. >

首先，这种语言互操作性不是语言标准的一部分，而是平台ABI的一部分.在讨论Linux x86-64时，相关文档为 System V x86-64 ABI .

首先，没有任何地方说明C _Bool类型(在3.1.2注†处定义与C ++ bool相同)与Fortran LOGICAL具有任何兼容性.特别地，在9.2.2中，表9.2规定了普通"字样. LOGICAL映射到signed int.关于TYPE*N类型，它说

"TYPE*N"符号指定类型为TYPE的变量或聚合成员应占用N个字节的存储空间.

(同上)

没有为LOGICAL*1明确指定等效类型，这是可以理解的:它甚至不是标准的.确实，如果您尝试以符合Fortran 95的方式编译包含LOGICAL*1的Fortran程序，则会通过ifort

收到有关该警告的警告.

./example.f90(2): warning #6916: Fortran 95 does not allow this length specification.   [1]

    logical*1, intent(in) :: x

------------^

而且很努力

./example.f90:2:13:
     logical*1, intent(in) :: x
             1
Error: GNU Extension: Nonstandard type declaration LOGICAL*1 at (1)

所以水已经浑浊了；因此，结合以上两个规则，我会保证signed char的安全.

但是:ABI还指定:

LOGICAL类型的值被.TRUE.实现为1和.FALSE. 实现为0.

因此，如果您有一个程序将1和0以外的任何内容存储在LOGICAL值中，则您在Fortran方面已经超出规格了！！你说:

fortran logical*1与bool具有相同的表示形式，但是在fortran中，如果位为00000011，则为true，而在C ++中则未定义.

最后一条陈述不正确，Fortran标准与表示无关，而ABI明确表示相反.的确，您可以通过查看gfort的输出以进行LOGICAL比较，轻松地看到这一点. :

integer function logical_compare(x, y)
    logical, intent(in) :: x
    logical, intent(in) :: y
    if (x .eqv. y) then
        logical_compare = 12
    else
        logical_compare = 24
    end if
end function logical_compare

成为

logical_compare_:
        mov     eax, DWORD PTR [rsi]
        mov     edx, 24
        cmp     DWORD PTR [rdi], eax
        mov     eax, 12
        cmovne  eax, edx
        ret

您会注意到，两个值之间有一个直接的cmp，而没有先对其进行归一化(与ifort不同，在这方面更保守).

更有趣的是:不管ABI怎么说，默认情况下，ifort对LOGICAL使用非标准表示形式；这在 -fpscomp logicals 开关文档，其中还指定了有关LOGICAL和跨语言兼容性的一些有趣的详细信息:

指定将具有非零值的整数视为true，将具有零值的整数视为false.文字常量.TRUE.具有1的整数值和文字常量.FALSE.的整数值为0.此表示形式由8.0之前的Intel Fortran版本和Fortran PowerStation使用.

默认值为fpscomp nologicals，它指定将奇数整数值(低位1)视为true，并将偶数整数(低位零)视为false.

文字常量.TRUE.具有整数值-1和文字常量.FALSE.具有0的整数值.Compaq Visual Fortran使用此表示形式. Fortran标准未指定LOGICAL值的内部表示形式. 在LOGICAL上下文中使用整数值或将LOGICAL值传递给用其他语言编写的过程的程序是不可移植的，可能无法正确执行.英特尔建议您避免依赖于LOGICAL值的内部表示形式的编码实践.

(添加了重点)

现在，LOGICAL的内部表示形式通常不成问题，因为根据我的收集，如果您按规则"进行游戏，并且不要跨越语言界限，您不会注意到.对于符合标准的程序，不存在直接转换".在INTEGER和LOGICAL之间；我看到的唯一可以将INTEGER推入LOGICAL的方法似乎是TRANSFER，它本质上是不可移植的，不能提供真正的保证，或者是非标准的INTEGER<->. LOGICAL分配后的转换.

记录了后一个 > 总是使结果始终为非零-> .TRUE.，零-> .FALSE.和您可以看到，在所有情况下都生成了代码来实现此目的(甚至尽管在使用传统表示法的ifort情况下它是复杂的代码)，所以您似乎无法以这种方式将任意整数推入LOGICAL中.

logical*1 function integer_to_logical(x)
    integer, intent(in) :: x
    integer_to_logical = x
    return
end function integer_to_logical

integer_to_logical_:
        mov     eax, DWORD PTR [rdi]
        test    eax, eax
        setne   al
        ret

LOGICAL*1的反向转换是整数零扩展(gfort)，因此，为了兑现上面链接的文档中的合同，很显然，我们希望LOGICAL值为0或1./p>

但通常，这些转换的情况有点一团糟，所以我就远离他们.

所以，长话短说:避免将INTEGER数据放入LOGICAL值，因为即使在Fortran中它也很糟糕，并确保使用正确的编译器标志来获取ABI兼容的布尔值表示形式和互操作性使用C/C ++应该没问题.但是为了更加安全，我只在C ++一侧使用普通的char.

最后，从我收集到的信息中从文档开始，在ifort中，内置了对与C互操作性的支持，包括布尔值；您可以尝试利用它.

If I get a bool variable and set its second bit to 1, then variable evaluates to true and false at the same time. Compile the following code with gcc6.3 with -g option, (gcc-v6.3.0/Linux/RHEL6.0-2016-x86_64/bin/g++ -g main.cpp -o mytest_d) and run the executable. You get the following.

How can T be equal to true and false at the same time?

       value   bits 
       -----   ---- 
    T:   1     0001
after bit change
    T:   3     0011
T is true
T is false

This can happen when you call a function in a different language (say fortran) where true and false definition is different than C++. For fortran if any bits are not 0 then the value is true, if all bits are zero then the value is false.

#include <iostream>
#include <bitset>

using namespace std;

void set_bits_to_1(void* val){
  char *x = static_cast<char *>(val);

  for (int i = 0; i<2; i++ ){
    *x |= (1UL << i);
  }
}

int main(int argc,char *argv[])
{

  bool T = 3;

  cout <<"       value   bits " <<endl;
  cout <<"       -----   ---- " <<endl;
  cout <<"    T:   "<< T <<"     "<< bitset<4>(T)<<endl;

  set_bits_to_1(&T);


  bitset<4> bit_T = bitset<4>(T);
  cout <<"after bit change"<<endl;
  cout <<"    T:   "<< T <<"     "<< bit_T<<endl;

  if (T ){
    cout <<"T is true" <<endl;
  }

  if ( T == false){
    cout <<"T is false" <<endl;
  }


}

/////////////////////////////////// // Fortran function that is not compatible with C++ when compiled with ifort.

       logical*1 function return_true()
         implicit none

         return_true = 1;

       end function return_true

解决方案

In C++ the bit representation (and even the size) of a bool is implementation defined; generally it's implemented as a char-sized type taking 1 or 0 as possible values.

If you set its value to anything different from the allowed ones (in this specific case by aliasing a bool through a char and modifying its bit representation), you are breaking the rules of the language, so anything can happen. In particular, it's explicitly specified in the standard that a "broken" bool may behave as both true and false (or neither true nor false) at the same time:

Using a bool value in ways described by this International Standard as "undefined," such as by examining the value of an uninitialized automatic object, might cause it to behave as if it is neither true nor false

(C++11, [basic.fundamental], note 47)

In this particular case, you can see how it ended up in this bizarre situation: the first if gets compiled to

    movzx   eax, BYTE PTR [rbp-33]
    test    al, al
    je      .L22

which loads T in eax (with zero extension), and skips the print if it's all zero; the next if instead is

    movzx   eax, BYTE PTR [rbp-33]
    xor     eax, 1
    test    al, al
    je      .L23

The test if(T == false) is transformed to if(T^1), which flips just the low bit. This would be ok for a valid bool, but for your "broken" one it doesn't cut it.

Notice that this bizarre sequence is only generated at low optimization levels; at higher levels this is generally going to boil down to a zero/nonzero check, and a sequence like yours is likely to become a single test/conditional branch. You will get bizarre behavior anyway in other contexts, e.g. when summing bool values to other integers:

int foo(bool b, int i) {
    return i + b;
}

becomes

foo(bool, int):
        movzx   edi, dil
        lea     eax, [rdi+rsi]
        ret

where dil is "trusted" to be 0/1.

If your program is all C++, then the solution is simple: don't break bool values this way, avoid messing with their bit representation and everything will go well; in particular, even if you assign from an integer to a bool the compiler will emit the necessary code to make sure that the resulting value is a valid bool, so your bool T = 3 is indeed safe, and T will end up with a true in its guts.

If instead you need to interoperate with code written in other languages that may not share the same idea of what a bool is, just avoid bool for "boundary" code, and marshal it as an appropriately-sized integer. It will work in conditionals & co. just as fine.

Update about the Fortran/interoperability side of the issue

Disclaimer all I know of Fortran is what I read this morning on standard documents, and that I have some punched cards with Fortran listings that I use as bookmarks, so go easy on me.

First of all, this kind of language interoperability stuff isn't part of the language standards, but of the platform ABI. As we are talking about Linux x86-64, the relevant document is the System V x86-64 ABI.

First of all, nowhere is specified that the C _Bool type (which is defined to be the same as C++ bool at 3.1.2 note †) has any kind of compatibility with Fortran LOGICAL; in particular, at 9.2.2 table 9.2 specifies that "plain" LOGICAL is mapped to signed int. About TYPE*N types it says that

The "TYPE*N" notation specifies that variables or aggregate members of type TYPE shall occupy N bytes of storage.

(ibid.)

There's no equivalent type explicitly specified for LOGICAL*1, and it's understandable: it's not even standard; indeed if you try to compile a Fortran program containing a LOGICAL*1 in Fortran 95 compliant mode you get warnings about it, both by ifort

./example.f90(2): warning #6916: Fortran 95 does not allow this length specification.   [1]

    logical*1, intent(in) :: x

------------^

and by gfort

./example.f90:2:13:
     logical*1, intent(in) :: x
             1
Error: GNU Extension: Nonstandard type declaration LOGICAL*1 at (1)

so the waters are already muddled; so, combining the two rules above, I'd go for signed char to be safe.

However: the ABI also specifies:

The values for type LOGICAL are .TRUE. implemented as 1 and .FALSE. implemented as 0.

So, if you have a program that stores anything besides 1 and 0 in a LOGICAL value, you are already out of spec on the Fortran side! You say:

A fortran logical*1 has same representation as bool, but in fortran if bits are 00000011 it is true, in C++ it is undefined.

This last statement is not true, the Fortran standard is representation-agnostic, and the ABI explicitly says the contrary. Indeed you can see this in action easily by checking the output of gfort for LOGICAL comparison:

integer function logical_compare(x, y)
    logical, intent(in) :: x
    logical, intent(in) :: y
    if (x .eqv. y) then
        logical_compare = 12
    else
        logical_compare = 24
    end if
end function logical_compare

becomes

logical_compare_:
        mov     eax, DWORD PTR [rsi]
        mov     edx, 24
        cmp     DWORD PTR [rdi], eax
        mov     eax, 12
        cmovne  eax, edx
        ret

You'll notice that there's a straight cmp between the two values, without normalizing them first (unlike ifort, that is more conservative in this regard).

Even more interesting: regardless of what the ABI says, ifort by default uses a nonstandard representation for LOGICAL; this is explained in the -fpscomp logicals switch documentation, which also specifies some interesting details about LOGICAL and cross-language compatibility:

Specifies that integers with a non-zero value are treated as true, integers with a zero value are treated as false. The literal constant .TRUE. has an integer value of 1, and the literal constant .FALSE. has an integer value of 0. This representation is used by Intel Fortran releases before Version 8.0 and by Fortran PowerStation.

The default is fpscomp nologicals, which specifies that odd integer values (low bit one) are treated as true and even integer values (low bit zero) are treated as false.

The literal constant .TRUE. has an integer value of -1, and the literal constant .FALSE. has an integer value of 0. This representation is used by Compaq Visual Fortran. The internal representation of LOGICAL values is not specified by the Fortran standard. Programs which use integer values in LOGICAL contexts, or which pass LOGICAL values to procedures written in other languages, are non-portable and may not execute correctly. Intel recommends that you avoid coding practices that depend on the internal representation of LOGICAL values.

(emphasis added)

Now, the internal representation of a LOGICAL normally shouldn't a problem, as, from what I gather, if you play "by the rules" and don't cross language boundaries you aren't going to notice. For a standard compliant program there's no "straight conversion" between INTEGER and LOGICAL; the only way I see you can shove an INTEGER into a LOGICAL seem to be TRANSFER, which is intrinsically non-portable and give no real guarantees, or the non-standard INTEGER <-> LOGICAL conversion on assignment.

The latter one is documented by gfort to always result in nonzero -> .TRUE., zero -> .FALSE., and you can see that in all cases code is generated to make this happen (even though it's convoluted code in case of ifort with the legacy representation), so you cannot seem to shove an arbitrary integer into a LOGICAL in this way.

logical*1 function integer_to_logical(x)
    integer, intent(in) :: x
    integer_to_logical = x
    return
end function integer_to_logical

integer_to_logical_:
        mov     eax, DWORD PTR [rdi]
        test    eax, eax
        setne   al
        ret

The reverse conversion for a LOGICAL*1 is a straight integer zero-extension (gfort), so, to be honoring the contract in the documentation linked above, it's clearly expecting the LOGICAL value to be 0 or 1.

But in general, the situation for these conversions is a bit of a mess, so I'd just stay away from them.

So, long story short: avoid putting INTEGER data into LOGICAL values, as it is bad even in Fortran, and make sure to use the correct compiler flag to get the ABI-compliant representation for booleans, and interoperability with C/C++ should be fine. But to be extra safe, I'd just use plain char on the C++ side.

Finally, from what I gather from the documentation, in ifort there is some builtin support for interoperability with C, including booleans; you may try to leverage it.

这篇关于在布尔中设置额外的位可使其同时为真和为假的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在布尔中设置额外的位可使其同时为真和为假 [英] Setting extra bits in a bool makes it true and false at the same time

问题描述

有关问题的Fortran/互操作性方面的更新

Update about the Fortran/interoperability side of the issue

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

在布尔中设置额外的位可使其同时为真和为假 [英] Setting extra bits in a bool makes it true and false at the same time

问题描述

有关问题的Fortran/互操作性方面的更新

Update about the Fortran/interoperability side of the issue

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭