C ++标准是否允许未初始化的bool导致程序崩溃? [英] Does the C++ standard allow for an uninitialized bool to crash a program?

查看:178
本文介绍了C ++标准是否允许未初始化的bool导致程序崩溃?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道C ++中的未定义的行为" 几乎可以使编译器执行其所需的任何操作.但是,由于我以为代码足够安全,所以发生了一次崩溃,这让我感到惊讶.

在这种情况下,真正的问题仅在使用特定编译器的特定平台上发生,并且仅在启用优化的情况下发生.

为了重现此问题并将其简化到最大程度,我尝试了几件事.这是一个名为Serialize的函数的摘录,该函数带有一个bool参数,并将字符串truefalse复制到现有的目标缓冲区中.

该功能是否在代码审查中,如果bool参数是未初始化的值,实际上没有办法告诉它崩溃吗?

// Zero-filled global buffer of 16 characters
char destBuffer[16];

void Serialize(bool boolValue) {
    // Determine which string to print based on boolValue
    const char* whichString = boolValue ? "true" : "false";

    // Compute the length of the string we selected
    const size_t len = strlen(whichString);

    // Copy string into destination buffer, which is zero-filled (thus already null-terminated)
    memcpy(destBuffer, whichString, len);
}

如果此代码是使用clang 5.0.0 +优化执行的,则将/可能崩溃.

期望的三元运算符boolValue ? "true" : "false"对我来说看起来足够安全,我想:"boolValue中的任何垃圾值都无关紧要,因为无论如何它都会评估为true或false."

我已经设置了 Compiler Explorer示例,该示例显示了反汇编中的问题,此处为完整示例. 注意:为了解决这个问题,我发现有效的组合是通过将Clang 5.0.0与-O2优化一起使用.

#include <iostream>
#include <cstring>

// Simple struct, with an empty constructor that doesn't initialize anything
struct FStruct {
    bool uninitializedBool;

   __attribute__ ((noinline))  // Note: the constructor must be declared noinline to trigger the problem
   FStruct() {};
};

char destBuffer[16];

// Small utility function that allocates and returns a string "true" or "false" depending on the value of the parameter
void Serialize(bool boolValue) {
    // Determine which string to print depending if 'boolValue' is evaluated as true or false
    const char* whichString = boolValue ? "true" : "false";

    // Compute the length of the string we selected
    size_t len = strlen(whichString);

    memcpy(destBuffer, whichString, len);
}

int main()
{
    // Locally construct an instance of our struct here on the stack. The bool member uninitializedBool is uninitialized.
    FStruct structInstance;

    // Output "true" or "false" to stdout
    Serialize(structInstance.uninitializedBool);
    return 0;
}

出现问题的原因是优化器:足够聪明地推断出字符串"true"和"false"的长度仅相差1.因此,不是真正地计算长度,而是使用bool本身的值,从技术上来说,应该为0或1,并且如下所示:

const size_t len = strlen(whichString); // original code
const size_t len = 5 - boolValue;       // clang clever optimization

虽然这是聪明的",可以这么说,但我的问题是: C ++标准是否允许编译器假定布尔值只能具有内部数字表示形式"0"或"1"并使用它这样吗?

这是否是实现定义的情况,在这种情况下,实现假设所有的布尔值都只会包含0或1,而其他任何值都是未定义的行为范围?

解决方案

是的,ISO C ++允许(但不要求)实施方案来做出此选择.

但还请注意,如果程序遇到UB,例如ISO ++,ISO C ++允许编译器发出有意崩溃的代码(例如,使用非法指令).作为帮助您发现错误的方法. (或者是因为它是DeathStation9000.仅严格遵守标准并不足以使C ++实现对任何实际目的都有用). 因此,ISO C ++允许编译器使asm崩溃(由于完全不同的原因),即使在读取未初始化的uint32_t的类似代码上也是如此.陷阱表示.

这是一个关于实际实现的工作方式的有趣问题,但是请记住,即使答案不同,您的代码仍将是不安全的,因为现代C ++并不是汇编语言的可移植版本.


您正在编译 x86- 64 System V ABI ,它指定bool作为寄存器中的函数arg,由寄存器的低8位中的位模式false=0true=1 表示. sup> 1 .在内存中,bool是1字节类型,必须再次具有0或1的整数值.

(ABI是同一平台的编译器都同意的一组实现选择,因此它们可以编写可调用彼此功能的代码,包括类型大小,结构布局规则和调用约定.)

ISO C ++没有指定它,但是该ABI决定很普遍,因为它使bool-> int转换便宜(扩展名为零).我不知道对于任何体系结构(不仅仅是x86),任何ABI都不允许编译器为bool假定0或1.它允许像!myboolxor eax,1这样的优化来翻转低位:布尔值在编译器中为8位.对它们的操作效率低下吗?.

通常,as-if规则允许编译器利用正在为其编译的目标平台上的正确的东西 ,因为最终结果将是是实现与C ++源代码相同的外部可见行为的可执行代码. (由于未定义行为"对实际上外部可见"的所有限制:不是使用调试器,而是使用格式良好/合法的C ++程序中的另一个线程.)

绝对可以允许编译器在其代码生成中充分利用ABI保证,并使像您发现的代码一样将strlen(whichString)优化为
5U - boolValue.
(顺便说一句,这种优化是一种聪明,但可能是短视与分支和内联memcpy作为即时数据 2 的存储.)

或者编译器可以创建一个指针表,并使用bool的整数值对其进行索引,再次假设它是0或1.(class{}定义中的定义,因此所有翻译单元必须具有相同的定义.与inline关键字类似. )

因此,编译器可能只发出retud2(非法指令)作为main的定义,因为从main的顶部开始的执行路径不可避免地会遇到未定义的行为. (如果编译器决定遵循通过非内联构造函数的路径,则编译器可以在编译时看到.)

遇到UB的任何程序对于它的整个存在都是完全未定义的.但是,从未真正运行过的函数或if()分支中的UB不会破坏程序的其余部分.实际上,这意味着编译器可以决定发出非法指令或ret,或者不发出任何东西而落入下一个块/函数,因为整个基本块可以在编译时证明包含或导致UB.

实践中的GCC和Clang do 实际上有时会在UB上发出ud2,而不是试图为没有意义的执行路径生成代码.在非non-c48函数结束的情况下,gcc有时会省略ret指令.如果您以为我的函数将随RAX中的任何垃圾一起返回",那么您会感到非常误解. 现代C ++编译器不再像可移植汇编语言那样对待该语言.您的程序实际上必须是有效的C ++,而不必假设函数的独立非内联版本在asm中的外观.

另一个有趣的示例是为什么对AMD64的内存进行未对齐访问有时会在AMD64上出现段错误?. x86不会在未对齐的整数上出错,对吗?那么为什么未对齐的uint16_t*会成为问题?因为alignof(uint16_t) == 2,并且违反该假设会导致在使用SSE2自动矢量化时出现段错误.

另请参见 每个C程序员应该了解的未定义行为#1/3 ,这是由clang开发人员撰写的文章.

要点:如果编译器在编译时注意到UB,则它可以破坏"(发出令人惊讶的asm)遍历导致UB的代码路径,即使目标是ABI,模式是bool的有效对象表示形式.

期望程序员对许多错误有完全的敌意,尤其是现代编译器警告的事情.这就是为什么您应该使用-Wall并修复警告的原因. C ++不是一种用户友好的语言,C ++中的某些内容即使在您要为其编译的目标上的asm中是安全的,也可能是不安全的. (例如,在C ++中,签名溢出是UB,并且编译器会假定它不会发生,即使使用2的补码x86进行编译,除非您使用clang/gcc -fwrapv.)

可在编译时看到的UB总是很危险,而且很难(通过链接时优化)确保您确实对编译器隐藏了UB,从而可以推断出它将生成哪种asm.

不要太夸张;通常,编译器确实会让您无所事事,并且发出代码,就像您期望的那样,即使是UB.但是,如果编译器开发人员实施某种优化以获取有关值范围的更多信息(例如,变量为非负数,也许允许其优化符号扩展以在x86上释放零扩展),那么将来可能会成为问题. 64).例如,在当前的gcc和clang中,执行tmp = a+INT_MIN并不会将a<0优化为始终为false,只有tmp始终为负. (因为INT_MIN + a=INT_MAX在此2的补码目标上为负,并且a不能高于该值.)

因此,gcc/clang当前不回溯以得出计算输入的范围信息,仅基于基于无符号溢出的假设得出的结果:

还要注意,允许实现(也称为编译器)定义ISO C ++未定义的行为.例如,所有支持Intel内在函数的编译器(例如用于手动SIMD矢量化的_mm_add_ps(__m128, __m128))都必须允许形成未对齐的指针,即使您取消引用它们,C ++中的UB也是如此. __m128i _mm_loadu_si128(const __m128i *)通过获取未对齐的__m128i* arg(而不是void*char*)来执行未对齐的负载. 硬件向量指针与相应类型的未定义行为?

GNU C/C ++还定义了将负号负数(即使不带-fwrapv)左移的行为,也不同于常规的带符号溢出的UB规则. (这是ISO中的UB C ++ ,而有符号数的右移是实现定义的(逻辑与算术);高质量的实现在具有算术右移的硬件上选择算术,但ISO C ++未指定).这在 GCC手册的整数"部分中进行了记录,并定义了实现- C标准要求实现定义一种或另一种方式的已定义行为.

肯定有编译器开发人员关心的实现质量问题;他们通常不会尝试使故意有恶意的编译器,但是有时利用C ++中的所有UB漏洞(他们选择定义的漏洞除外)来更好地进行优化几乎是无法区分的. /p>


脚注1 :高位56位可能是被调用者必须忽略的垃圾,对于小于寄存器的类型通常如此.

(其他ABI在这里做出不同的选择.有些确实要求将窄整数类型传递给或从中返回时,将其进行零扩展或符号扩展以填充寄存器.功能,例如MIPS64和PowerPC64.请参阅此x86的最后一部分-64答案与早期的ISA进行比较.)

例如,在调用bool_func(a&1)之前,调用者可能已经在RDI中计算了a & 0x01010101并将其用于其他用途.调用方可以优化&1,因为它已经作为and edi, 0x01010101的一部分对低字节进行了处理,并且知道被调用方需要忽略高字节.

或者如果将bool作为第三个arg传递,则优化代码大小的调用程序可能会使用mov dl, [mem]而不是movzx edx, [mem]加载代码,从而节省了1个字节,但代价是对RDX的旧值的错误依赖性(或其他部分注册效果,具体取决于CPU型号).或者对于第一个arg,用mov dil, byte [r10]代替movzx edi, byte [r10],因为两者都需要REX前缀.

这就是为什么clang在Serialize而不是sub eax, edi中发出movzx eax, dil的原因. (对于整数args,clang违反了此ABI规则,取而代之的是,根据gcc和clang的未记录行为,将窄整数零扩展或符号扩展为32位. L(between_4_7):阻止在glibc的memcpy/memmove中.至少,对于memcpy分支中的布尔值,选择块大小的方法相同.

如果进行内联,则可以使用2x mov -immediate + cmov和条件偏移量,或者可以将字符串数据保留在内存中.

或者是否要针对Intel Ice Lake进行调优(使用快速短路REP MOV功能),那么实际的rep movsb可能是最佳选择.对于具有该功能的CPU,glibc memcpy可能会开始在较小的CPU上使用rep movsb,从而节省了大量分支.


检测UB和使用未初始化值的工具

在gcc和clang中,您可以使用-fsanitize=undefined进行编译以添加运行时检测,该检测将在运行时发生的UB上发出警告或错误提示.但是,这不会捕获统一变量. (因为它不会增加类型大小来为未初始化"的位留出空间).

请参见 https://developers. redhat.com/blog/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/

要查找未初始化数据的用法,请在clang/LLVM中使用Address Sanitizer和Memory Sanitizer. https://github.com/google/sanitizers/wiki/MemorySanitizer 显示了clang -fsanitize=memory -fPIE -pie检测未初始化的内存读取的示例.如果您不进行优化而编译 ,则可能效果最好,因此,所有读取的变量最终都会实际从asm的内存中加载.他们显示在负载无法优化的情况下,它已在-O2中使用.我自己还没有尝试过. (在某些情况下,例如,在对数组求和之前未初始化累加器,clang -O3将发出求和的代码到从未初始化的向量寄存器中.因此,通过优化,您可能会遇到与UB无关的内存读取的情况. .但是-fsanitize=memory更改了生成的asm,可能会对此进行检查.)

它将允许复制未初始化的内存,并且可以使用它进行简单的逻辑和算术运算.通常,MemorySanitizer会静默地跟踪未初始化数据在内存中的传播,并在根据未初始化值执行(或不执行)代码分支时报告警告.

MemorySanitizer实现了Valgrind(Memcheck工具)中发现的功能的子集.

在这种情况下应该可以使用,因为使用未初始化内存计算出的带有length的glibc memcpy的调用将(在库内部)导致基于length的分支.如果它已内联了仅使用cmov,索引和两个存储的完全无分支版本,则可能无法正常工作.

Valgrind的memcheck 也会再次寻找此类问题如果程序只是围绕未初始化的数据进行复制,则不会抱怨.但是它说它将检测到有条件的跳跃或移动取决于未初始化的值",以试图捕获任何依赖于未初始化数据的外部可见行为.

也许不标记负载的想法是结构可以具有填充,并且即使单个成员只在一个位置写入一个成员,复制具有宽矢量负载/存储的整个结构(包括填充)也不是一个错误.时间.在asm级别上,有关填充内容和值的实际组成部分的信息已丢失.

I know that an "undefined behaviour" in C++ can pretty much allow the compiler to do anything it wants. However, I had a crash that surprised me, as I assumed that the code was safe enough.

In this case, the real problem happened only on a specific platform using a specific compiler, and only if optimization was enabled.

I tried several things in order to reproduce the problem and simplify it to the maximum. Here's an extract of a function called Serialize, that would take a bool parameter, and copy the string true or false to an existing destination buffer.

Would this function be in a code review, there would be no way to tell that it, in fact, could crash if the bool parameter was an uninitialized value?

// Zero-filled global buffer of 16 characters
char destBuffer[16];

void Serialize(bool boolValue) {
    // Determine which string to print based on boolValue
    const char* whichString = boolValue ? "true" : "false";

    // Compute the length of the string we selected
    const size_t len = strlen(whichString);

    // Copy string into destination buffer, which is zero-filled (thus already null-terminated)
    memcpy(destBuffer, whichString, len);
}

If this code is executed with clang 5.0.0 + optimizations, it will/can crash.

The expected ternary-operator boolValue ? "true" : "false" looked safe enough for me, I was assuming, "Whatever garbage value is in boolValue doesn't matter, since it will evaluate to true or false anyhow."

I have setup a Compiler Explorer example that shows the problem in the disassembly, here the complete example. Note: in order to repro the issue, the combination I've found that worked is by using Clang 5.0.0 with -O2 optimisation.

#include <iostream>
#include <cstring>

// Simple struct, with an empty constructor that doesn't initialize anything
struct FStruct {
    bool uninitializedBool;

   __attribute__ ((noinline))  // Note: the constructor must be declared noinline to trigger the problem
   FStruct() {};
};

char destBuffer[16];

// Small utility function that allocates and returns a string "true" or "false" depending on the value of the parameter
void Serialize(bool boolValue) {
    // Determine which string to print depending if 'boolValue' is evaluated as true or false
    const char* whichString = boolValue ? "true" : "false";

    // Compute the length of the string we selected
    size_t len = strlen(whichString);

    memcpy(destBuffer, whichString, len);
}

int main()
{
    // Locally construct an instance of our struct here on the stack. The bool member uninitializedBool is uninitialized.
    FStruct structInstance;

    // Output "true" or "false" to stdout
    Serialize(structInstance.uninitializedBool);
    return 0;
}

The problem arises because of the optimizer: It was clever enough to deduce that the strings "true" and "false" only differs in length by 1. So instead of really calculating the length, it uses the value of the bool itself, which should technically be either 0 or 1, and goes like this:

const size_t len = strlen(whichString); // original code
const size_t len = 5 - boolValue;       // clang clever optimization

While this is "clever", so to speak, my question is: Does the C++ standard allow a compiler to assume a bool can only have an internal numerical representation of '0' or '1' and use it in such a way?

Or is this a case of implementation-defined, in which case the implementation assumed that all its bools will only ever contain 0 or 1, and any other value is undefined behaviour territory?

解决方案

Yes, ISO C++ allows (but doesn't require) implementations to make this choice.

But also note that ISO C++ allows a compiler to emit code that crashes on purpose (e.g. with an illegal instruction) if the program encounters UB, e.g. as a way to help you find errors. (Or because it's a DeathStation 9000. Being strictly conforming is not sufficient for a C++ implementation to be useful for any real purpose). So ISO C++ would allow a compiler to make asm that crashed (for totally different reasons) even on similar code that read an uninitialized uint32_t. Even though that's required to be a fixed-layout type with no trap representations.

It's an interesting question about how real implementations work, but remember that even if the answer was different, your code would still be unsafe because modern C++ is not a portable version of assembly language.


You're compiling for the x86-64 System V ABI, which specifies that a bool as a function arg in a register is represented by the bit-patterns false=0 and true=1 in the low 8 bits of the register1. In memory, bool is a 1-byte type that again must have an integer value of 0 or 1.

(An ABI is a set of implementation choices that compilers for the same platform agree on so they can make code that calls each other's functions, including type sizes, struct layout rules, and calling conventions.)

ISO C++ doesn't specify it, but this ABI decision is widespread because it makes bool->int conversion cheap (just zero-extension). I'm not aware of any ABIs that don't let the compiler assume 0 or 1 for bool, for any architecture (not just x86). It allows optimizations like !mybool with xor eax,1 to flip the low bit: Any possible code that can flip a bit/integer/bool between 0 and 1 in single CPU instruction. Or compiling a&&b to a bitwise AND for bool types. Some compilers do actually take advantage Boolean values as 8 bit in compilers. Are operations on them inefficient?.

In general, the as-if rule allows allows the compiler to take advantage of things that are true on the target platform being compiled for, because the end result will be executable code that implements the same externally-visible behaviour as the C++ source. (With all the restrictions that Undefined Behaviour places on what is actually "externally visible": not with a debugger, but from another thread in a well-formed / legal C++ program.)

The compiler is definitely allowed to take full advantage of an ABI guarantee in its code-gen, and make code like you found which optimizes strlen(whichString) to
5U - boolValue.
(BTW, this optimization is kind of clever, but maybe shortsighted vs. branching and inlining memcpyas stores of immediate data2.)

Or the compiler could have created a table of pointers and indexed it with the integer value of the bool, again assuming it was a 0 or 1. (This possibility is what @Barmar's answer suggested.)


Your __attribute((noinline)) constructor with optimization enabled led to clang just loading a byte from the stack to use as uninitializedBool. It made space for the object in main with push rax (which is smaller and for various reason about as efficient as sub rsp, 8), so whatever garbage was in AL on entry to main is the value it used for uninitializedBool. This is why you actually got values that weren't just 0.

5U - random garbage can easily wrap to a large unsigned value, leading memcpy to go into unmapped memory. The destination is in static storage, not the stack, so you're not overwriting a return address or something.


Other implementations could make different choices, e.g. false=0 and true=any non-zero value. Then clang probably wouldn't make code that crashes for this specific instance of UB. (But it would still be allowed to if it wanted to.) I don't know of any implementations that choose anything other what x86-64 does for bool, but the C++ standard allows many things that nobody does or even would want to do on hardware that's anything like current CPUs.

ISO C++ leaves it unspecified what you'll find when you examine or modify the object representation of a bool. (e.g. by memcpying the bool into unsigned char, which you're allowed to do because char* can alias anything. And unsigned char is guaranteed to have no padding bits, so the C++ standard does formally let you hexdump object representations without any UB. Pointer-casting to copy the object representation is different from assigning char foo = my_bool, of course, so booleanization to 0 or 1 wouldn't happen and you'd get the raw object representation.)

You've partially "hidden" the UB on this execution path from the compiler with noinline. Even if it doesn't inline, though, interprocedural optimizations could still make a version of the function that depends on the definition of another function. (First, clang is making an executable, not a Unix shared library where symbol-interposition can happen. Second, the definition in inside the class{} definition so all translation units must have the same definition. Like with the inline keyword.)

So a compiler could emit just a ret or ud2 (illegal instruction) as the definition for main, because the path of execution starting at the top of main unavoidably encounters Undefined Behaviour. (Which the compiler can see at compile time if it decided to follow the path through the non-inline constructor.)

Any program that encounters UB is totally undefined for its entire existence. But UB inside a function or if() branch that never actually runs doesn't corrupt the rest of the program. In practice that means that compilers can decide to emit an illegal instruction, or a ret, or not emit anything and fall into the next block / function, for the whole basic block that can be proven at compile time to contain or lead to UB.

GCC and Clang in practice do actually sometimes emit ud2 on UB, instead of even trying to generate code for paths of execution that make no sense. Or for cases like falling off the end of a non-void function, gcc will sometimes omit a ret instruction. If you were thinking that "my function will just return with whatever garbage is in RAX", you are sorely mistaken. Modern C++ compilers don't treat the language like a portable assembly language any more. Your program really has to be valid C++, without making assumptions about how a stand-alone non inlined version of your function might look in asm.

Another fun example is Why does unaligned access to mmap'ed memory sometimes segfault on AMD64?. x86 doesn't fault on unaligned integers, right? So why would a misaligned uint16_t* be a problem? Because alignof(uint16_t) == 2, and violating that assumption led to a segfault when auto-vectorizing with SSE2.

See also What Every C Programmer Should Know About Undefined Behavior #1/3, an article by a clang developer.

Key point: if the compiler noticed the UB at compile time, it could "break" (emit surprising asm) the path through your code that causes UB even if targeting an ABI where any bit-pattern is a valid object representation for bool.

Expect total hostility toward many mistakes by the programmer, especially things modern compilers warn about. This is why you should use -Wall and fix warnings. C++ is not a user-friendly language, and something in C++ can be unsafe even if it would be safe in asm on the target you're compiling for. (e.g. signed overflow is UB in C++ and compilers will assume it doesn't happen, even when compiling for 2's complement x86, unless you use clang/gcc -fwrapv.)

Compile-time-visible UB is always dangerous, and it's really hard to be sure (with link-time optimization) that you've really hidden UB from the compiler and can thus reason about what kind of asm it will generate.

Not to be over-dramatic; often compilers do let you get away with some things and emit code like you're expecting even when something is UB. But maybe it will be a problem in the future if compiler devs implement some optimization that gains more info about value-ranges (e.g. that a variable is non-negative, maybe allowing it to optimize sign-extension to free zero-extension on x86-64). For example, in current gcc and clang, doing tmp = a+INT_MIN doesn't optimize a<0 as always-false, only that tmp is always negative. (Because INT_MIN + a=INT_MAX is negative on this 2's complement target, and a can't be any higher than that.)

So gcc/clang don't currently backtrack to derive range info for the inputs of a calculation, only on the results based on the assumption of no signed overflow: example on Godbolt. I don't know if this is optimization is intentionally "missed" in the name of user-friendliness or what.

Also note that implementations (aka compilers) are allowed to define behaviour that ISO C++ leaves undefined. For example, all compilers that support Intel's intrinsics (like _mm_add_ps(__m128, __m128) for manual SIMD vectorization) must allow forming mis-aligned pointers, which is UB in C++ even if you don't dereference them. __m128i _mm_loadu_si128(const __m128i *) does unaligned loads by taking a misaligned __m128i* arg, not a void* or char*. Is `reinterpret_cast`ing between hardware vector pointer and the corresponding type an undefined behavior?

GNU C/C++ also defines the behaviour of left-shifting a negative signed number (even without -fwrapv), separately from the normal signed-overflow UB rules. (This is UB in ISO C++, while right shifts of signed numbers are implementation-defined (logical vs. arithmetic); good quality implementations choose arithmetic on HW that has arithmetic right shifts, but ISO C++ doesn't specify). This is documented in the GCC manual's Integer section, along with defining implementation-defined behaviour that C standards require implementations to define one way or another.

There are definitely quality-of-implementation issues that compiler developers care about; they generally aren't trying to make compilers that are intentionally hostile, but taking advantage of all the UB potholes in C++ (except ones they choose to define) to optimize better can be nearly indistinguishable at times.


Footnote 1: The upper 56 bits can be garbage which the callee must ignore, as usual for types narrower than a register.

(Other ABIs do make different choices here. Some do require narrow integer types to be zero- or sign-extended to fill a register when passed to or returned from functions, like MIPS64 and PowerPC64. See the last section of this x86-64 answer which compares vs. those earlier ISAs.)

For example, a caller might have calculated a & 0x01010101 in RDI and used it for something else, before calling bool_func(a&1). The caller could optimize away the &1 because it already did that to the low byte as part of and edi, 0x01010101, and it knows the callee is required to ignore the high bytes.

Or if a bool is passed as the 3rd arg, maybe a caller optimizing for code-size loads it with mov dl, [mem] instead of movzx edx, [mem], saving 1 byte at the cost of a false dependency on the old value of RDX (or other partial-register effect, depending on CPU model). Or for the first arg, mov dil, byte [r10] instead of movzx edi, byte [r10], because both require a REX prefix anyway.

This is why clang emits movzx eax, dil in Serialize, instead of sub eax, edi. (For integer args, clang violates this ABI rule, instead depending on the undocumented behaviour of gcc and clang to zero- or sign-extend narrow integers to 32 bits. Is a sign or zero extension required when adding a 32bit offset to a pointer for the x86-64 ABI? So I was interested to see that it doesn't do the same thing for bool.)


Footnote 2: After branching, you'd just have a 4-byte mov-immediate, or a 4-byte + 1-byte store. The length is implicit in the store widths + offsets.

OTOH, glibc memcpy will do two 4-byte loads/stores with an overlap that depends on length, so this really does end up making the whole thing free of conditional branches on the boolean. See the L(between_4_7): block in glibc's memcpy/memmove. Or at least, go the same way for either boolean in memcpy's branching to select a chunk size.

If inlining, you could use 2x mov-immediate + cmov and a conditional offset, or you could leave the string data in memory.

Or if tuning for Intel Ice Lake (with the Fast Short REP MOV feature), an actual rep movsb might be optimal. glibc memcpy might start using rep movsb for small sizes on CPUs with that feature, saving a lot of branching.


Tools for detecting UB and usage of uninitialized values

In gcc and clang, you can compile with -fsanitize=undefined to add run-time instrumentation that will warn or error out on UB that happens at runtime. That won't catch unitialized variables, though. (Because it doesn't increase type sizes to make room for an "uninitialized" bit).

See https://developers.redhat.com/blog/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/

To find usage of uninitialized data, there's Address Sanitizer and Memory Sanitizer in clang/LLVM. https://github.com/google/sanitizers/wiki/MemorySanitizer shows examples of clang -fsanitize=memory -fPIE -pie detecting uninitialized memory reads. It might work best if you compile without optimization, so all reads of variables end up actually loading from memory in the asm. They show it being used at -O2 in a case where the load wouldn't optimize away. I haven't tried it myself. (In some cases, e.g. not initializing an accumulator before summing an array, clang -O3 will emit code that sums into a vector register that it never initialized. So with optimization, you can have a case where there's no memory read associated with the UB. But -fsanitize=memory changes the generated asm, and might result in a check for this.)

It will tolerate copying of uninitialized memory, and also simple logic and arithmetic operations with it. In general, MemorySanitizer silently tracks the spread of uninitialized data in memory, and reports a warning when a code branch is taken (or not taken) depending on an uninitialized value.

MemorySanitizer implements a subset of functionality found in Valgrind (Memcheck tool).

It should work for this case because the call to glibc memcpy with a length calculated from uninitialized memory will (inside the library) result in a branch based on length. If it had inlined a fully branchless version that just used cmov, indexing, and two stores, it might not have worked.

Valgrind's memcheck will also look for this kind of problem, again not complaining if the program simply copies around uninitialized data. But it says it will detect when a "Conditional jump or move depends on uninitialised value(s)", to try to catch any externally-visible behaviour that depends on uninitialized data.

Perhaps the idea behind not flagging just a load is that structs can have padding, and copying the whole struct (including padding) with a wide vector load/store is not an error even if the individual members were only written one at a time. At the asm level, the information about what was padding and what is actually part of the value has been lost.

这篇关于C ++标准是否允许未初始化的bool导致程序崩溃?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆