有趣的汇编,用于比较std :: primitive类型的可选 [英] Intriguing assembly for comparing std::optional of primitive types

查看:85
本文介绍了有趣的汇编,用于比较std :: primitive类型的可选的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Valgrind突然出现了有条件的跳跃或移动取决于我的一项单元测试中的未初始化值.

检查程序集,我发现以下代码:

bool operator==(MyType const& left, MyType const& right) {
    // ... some code ...
    if (left.getA() != right.getA()) { return false; }
    // ... some code ...
    return true;
}

MyType::getA() const -> std::optional<std::uint8_t>生成以下程序集:

   0x00000000004d9588 <+108>:   xor    eax,eax
   0x00000000004d958a <+110>:   cmp    BYTE PTR [r14+0x1d],0x0
   0x00000000004d958f <+115>:   je     0x4d9597 <... function... +123>
x  0x00000000004d9591 <+117>:   mov    r15b,BYTE PTR [r14+0x1c]
x  0x00000000004d9595 <+121>:   mov    al,0x1

   0x00000000004d9597 <+123>:   xor    edx,edx
   0x00000000004d9599 <+125>:   cmp    BYTE PTR [r13+0x1d],0x0
   0x00000000004d959e <+130>:   je     0x4d95ae <... function... +146>
x  0x00000000004d95a0 <+132>:   mov    dil,BYTE PTR [r13+0x1c]
x  0x00000000004d95a4 <+136>:   mov    dl,0x1
x  0x00000000004d95a6 <+138>:   mov    BYTE PTR [rsp+0x97],dil

   0x00000000004d95ae <+146>:   cmp    al,dl
   0x00000000004d95b0 <+148>:   jne    0x4da547 <... function... +4139>

   0x00000000004d95b6 <+154>:   cmp    r15b,BYTE PTR [rsp+0x97]
   0x00000000004d95be <+162>:   je     0x4d95c8 <... function... +172>

    => Jump on uninitialized

   0x00000000004d95c0 <+164>:   test   al,al
   0x00000000004d95c2 <+166>:   jne    0x4da547 <... function... +4139>

在未设置可选选项的情况下,我用x标记了未执行(跳过)的语句.

成员A在这里与MyType的偏移量为0x1c.检查std::optional的布局,我们看到:

  • +0x1d对应于bool _M_engaged
  • +0x1c对应于std::uint8_t _M_payload(在匿名联合内部).

std::optional的关注代码为:

constexpr explicit operator bool() const noexcept
{ return this->_M_is_engaged(); }

// Comparisons between optional values.
template<typename _Tp, typename _Up>
constexpr auto operator==(const optional<_Tp>& __lhs, const optional<_Up>& __rhs) -> __optional_relop_t<decltype(declval<_Tp>() == declval<_Up>())>
{
    return static_cast<bool>(__lhs) == static_cast<bool>(__rhs)
         && (!__lhs || *__lhs == *__rhs);
}

在这里,我们可以看到gcc对代码进行了相当大的改变;如果我正确理解的话,在C语言中会给出:

char rsp[0x148]; // simulate the stack

/* comparisons of prior data members */

/*
0x00000000004d9588 <+108>:   xor    eax,eax
0x00000000004d958a <+110>:   cmp    BYTE PTR [r14+0x1d],0x0
0x00000000004d958f <+115>:   je     0x4d9597 <... function... +123>
0x00000000004d9591 <+117>:   mov    r15b,BYTE PTR [r14+0x1c]
0x00000000004d9595 <+121>:   mov    al,0x1
*/

int eax = 0;
if (__lhs._M_engaged == 0) { goto b123; }
bool r15b = __lhs._M_payload;
eax = 1;

b123:
/*
0x00000000004d9597 <+123>:   xor    edx,edx
0x00000000004d9599 <+125>:   cmp    BYTE PTR [r13+0x1d],0x0
0x00000000004d959e <+130>:   je     0x4d95ae <... function... +146>
0x00000000004d95a0 <+132>:   mov    dil,BYTE PTR [r13+0x1c]
0x00000000004d95a4 <+136>:   mov    dl,0x1
0x00000000004d95a6 <+138>:   mov    BYTE PTR [rsp+0x97],dil
*/

int edx = 0;
if (__rhs._M_engaged == 0) { goto b146; }
rdi = __rhs._M_payload;
edx = 1;
rsp[0x97] = rdi;

b146:
/*
0x00000000004d95ae <+146>:   cmp    al,dl
0x00000000004d95b0 <+148>:   jne    0x4da547 <... function... +4139>
*/

if (eax != edx) { goto end; } // return false

/*
0x00000000004d95b6 <+154>:   cmp    r15b,BYTE PTR [rsp+0x97]
0x00000000004d95be <+162>:   je     0x4d95c8 <... function... +172>
*/

//  Flagged by valgrind
if (r15b == rsp[097]) { goto b172; } // next data member

/*
0x00000000004d95c0 <+164>:   test   al,al
0x00000000004d95c2 <+166>:   jne    0x4da547 <... function... +4139>
*/

if (eax == 1) { goto end; } // return false

b172:

/* comparison of following data members */

end:
    return false;

等同于:

//  Note how the operands of || are inversed.
return static_cast<bool>(__lhs) == static_cast<bool>(__rhs)
         && (*__lhs == *__rhs || !__lhs);

如果奇怪,我认为该程序集是正确的.也就是说,据我所知,未初始化值之间的比较结果实际上不会影响该函数的结果(与C或C ++不同,我确实希望比较x86程序集中的垃圾不是UB):

  1. 如果一个可选选项是nullopt,而另一个已设置,则+148处的条件跳转将跳转到end(return false),确定.
  2. 如果同时设置了两个可选参数,那么比较将读取初始化值,确定.

所以唯一感兴趣的情况是两个可选选项均为nullopt时:

  • 如果两个值比较相等,则代码得出的结论是可选参数相等,这是正确的,因为它们都是nullopt
  • 否则,代码得出的结论是,如果__lhs._M_engaged为false(真),则可选参数相等.

无论哪种情况,代码都得出结论,当两个都是nullopt时,这两个可选参数是相等的; CQFD.


这是我看到的gcc生成明显良性"的未初始化读取的第一个实例,因此我有几个问题:

  1. 未初始化的程序集(x84_64)是否可以读取?
  2. 这是优化失败的征兆(反转||),可能在非良性情况下触发吗?

就目前而言,我倾向于使用optimize(1)注释一些功能,作为一种变通方法,以防止启动优化.幸运的是,已识别的功能对性能并不重要.


环境:

  • 编译器:gcc 7.3
  • 编译标志:-std=c++17 -g -Wall -Werror -O3 -flto(+包括在内)
  • 链接标志:-O3 -flto(+适当的库)

注意:可以使用-O2而不是-O3出现,但永远不能没有-flto.


有趣的事实

在完整代码中,此模式在上述函数中针对各种有效负载:std::uint8_tstd::uint32_tstd::uint64_t甚至是struct { std::int64_t; std::int8_t; }出现32次.

它仅出现在几个大型的[c30>比较类型中,具有约40个数据成员,而没有出现在较小的类型中.对于std::optional<std::string_view>,即使在那些特定的功能(调用std::char_traits进行比较)中,它也不会出现.

最后,令人生厌的是,将所讨论的函数隔离在其自己的二进制文件中,使问题"消失.神话般的MCVE难以捉摸.

解决方案

在x86 asm中,发生的最坏情况是单个寄存器的值未知(或者您不知道它具有两个可能的值中的旧值)或新的(如果可能的话).但是如果您的代码不依赖于该寄存器值,就可以了,,与C ++不同. C ++ UB意味着从理论上来说,整个程序在一个有符号整数溢出之后就已经完全崩溃了,甚至在此之前,编译器可以看到的代码路径都将导致UB.在asm中不会发生这种情况,至少在没有特权的用户空间代码中不会发生这种情况.

(您可能需要做一些事情,通过以怪异的方式设置控制寄存器或将不一致的内容放入页表或描述符中,从而在内核中基本上引起系统范围内的不可预测的行为,但这不会发生.即使您正在编译内核代码,也是如此.)


某些ISA具有不可预测的行为",就像早期的ARM,如果您对乘法的多个操作数使用相同的寄存器,则行为是不可预测的.如果IDK允许中断管道并破坏其他寄存器,或者仅限于意外的乘法结果,则为IDK.后者将是我的猜测.

或者是MIPS,如果将分支放在分支延迟插槽中,则该行为是不可预测的. (由于分支延迟插槽,处理异常情况很麻烦……).但是大概仍然存在限制,您不能使计算机崩溃或破坏其他进程(在Unix之类的多用户系统中,如果没有特权的用户空间进程可能破坏其他用户的生命,那将是不好的.)

非常早的MIPS还具有加载延迟插槽,并增加了延迟插槽:您无法在下一条指令中使用加载结果.如果您过早地读取寄存器,可能只是获得了寄存器的旧值,或者仅仅是垃圾. MIPS =最小互锁管线阶段;他们想将停顿拖到软件上,但是事实证明,当编译器找不到对下一步膨胀的二进制文件有用的东西时,添加NOP会导致整体代码变慢,而在必要时会导致硬件停顿.但是,我们坚持使用分支延迟插槽,因为删除它们会改变ISA,与放宽对早期软件无法做到的限制不同.

Valgrind picked up a flurry Conditional jump or move depends on uninitialised value(s) in one of my unit tests.

Inspecting the assembly, I realized that the following code:

bool operator==(MyType const& left, MyType const& right) {
    // ... some code ...
    if (left.getA() != right.getA()) { return false; }
    // ... some code ...
    return true;
}

Where MyType::getA() const -> std::optional<std::uint8_t>, generated the following assembly:

   0x00000000004d9588 <+108>:   xor    eax,eax
   0x00000000004d958a <+110>:   cmp    BYTE PTR [r14+0x1d],0x0
   0x00000000004d958f <+115>:   je     0x4d9597 <... function... +123>
x  0x00000000004d9591 <+117>:   mov    r15b,BYTE PTR [r14+0x1c]
x  0x00000000004d9595 <+121>:   mov    al,0x1

   0x00000000004d9597 <+123>:   xor    edx,edx
   0x00000000004d9599 <+125>:   cmp    BYTE PTR [r13+0x1d],0x0
   0x00000000004d959e <+130>:   je     0x4d95ae <... function... +146>
x  0x00000000004d95a0 <+132>:   mov    dil,BYTE PTR [r13+0x1c]
x  0x00000000004d95a4 <+136>:   mov    dl,0x1
x  0x00000000004d95a6 <+138>:   mov    BYTE PTR [rsp+0x97],dil

   0x00000000004d95ae <+146>:   cmp    al,dl
   0x00000000004d95b0 <+148>:   jne    0x4da547 <... function... +4139>

   0x00000000004d95b6 <+154>:   cmp    r15b,BYTE PTR [rsp+0x97]
   0x00000000004d95be <+162>:   je     0x4d95c8 <... function... +172>

    => Jump on uninitialized

   0x00000000004d95c0 <+164>:   test   al,al
   0x00000000004d95c2 <+166>:   jne    0x4da547 <... function... +4139>

Where I marked with x the statements that are not executed (jumped over) in the case where the optional is NOT set.

The member A here is at offset 0x1c into MyType. Checking the layout of std::optional we see that:

  • +0x1d corresponds to bool _M_engaged,
  • +0x1c corresponds to std::uint8_t _M_payload (inside an anonymous union).

The code of interest for std::optional is:

constexpr explicit operator bool() const noexcept
{ return this->_M_is_engaged(); }

// Comparisons between optional values.
template<typename _Tp, typename _Up>
constexpr auto operator==(const optional<_Tp>& __lhs, const optional<_Up>& __rhs) -> __optional_relop_t<decltype(declval<_Tp>() == declval<_Up>())>
{
    return static_cast<bool>(__lhs) == static_cast<bool>(__rhs)
         && (!__lhs || *__lhs == *__rhs);
}

Here, we can see that gcc transformed the code quite substantially; if I understand it correctly, in C this gives:

char rsp[0x148]; // simulate the stack

/* comparisons of prior data members */

/*
0x00000000004d9588 <+108>:   xor    eax,eax
0x00000000004d958a <+110>:   cmp    BYTE PTR [r14+0x1d],0x0
0x00000000004d958f <+115>:   je     0x4d9597 <... function... +123>
0x00000000004d9591 <+117>:   mov    r15b,BYTE PTR [r14+0x1c]
0x00000000004d9595 <+121>:   mov    al,0x1
*/

int eax = 0;
if (__lhs._M_engaged == 0) { goto b123; }
bool r15b = __lhs._M_payload;
eax = 1;

b123:
/*
0x00000000004d9597 <+123>:   xor    edx,edx
0x00000000004d9599 <+125>:   cmp    BYTE PTR [r13+0x1d],0x0
0x00000000004d959e <+130>:   je     0x4d95ae <... function... +146>
0x00000000004d95a0 <+132>:   mov    dil,BYTE PTR [r13+0x1c]
0x00000000004d95a4 <+136>:   mov    dl,0x1
0x00000000004d95a6 <+138>:   mov    BYTE PTR [rsp+0x97],dil
*/

int edx = 0;
if (__rhs._M_engaged == 0) { goto b146; }
rdi = __rhs._M_payload;
edx = 1;
rsp[0x97] = rdi;

b146:
/*
0x00000000004d95ae <+146>:   cmp    al,dl
0x00000000004d95b0 <+148>:   jne    0x4da547 <... function... +4139>
*/

if (eax != edx) { goto end; } // return false

/*
0x00000000004d95b6 <+154>:   cmp    r15b,BYTE PTR [rsp+0x97]
0x00000000004d95be <+162>:   je     0x4d95c8 <... function... +172>
*/

//  Flagged by valgrind
if (r15b == rsp[097]) { goto b172; } // next data member

/*
0x00000000004d95c0 <+164>:   test   al,al
0x00000000004d95c2 <+166>:   jne    0x4da547 <... function... +4139>
*/

if (eax == 1) { goto end; } // return false

b172:

/* comparison of following data members */

end:
    return false;

Which is equivalent to:

//  Note how the operands of || are inversed.
return static_cast<bool>(__lhs) == static_cast<bool>(__rhs)
         && (*__lhs == *__rhs || !__lhs);

I think that the assembly is correct, if strange. That is, as far as I can see, the result of the comparison between uninitialized values does not actually influence the result of the function (and unlike C or C++, I do expect comparing junk in x86 assembly NOT to be UB):

  1. If one optional is nullopt and the other is set, then the conditional jump at +148 jumps to end (return false), OK.
  2. If both optionals are set, then the comparison reads initialized values, OK.

So the only case of interest is when both optionals are nullopt:

  • if the values compare equal, then the code concludes that the optionals are equal, which is true since they are both nullopt,
  • otherwise, the code concludes that the optionals are equal if __lhs._M_engaged is false, which is true.

In either case, the code therefore concludes that both optionals are equal when both are nullopt; CQFD.


This is the first instance I see of gcc generating apparently "benign" uninitialized reads, and therefore I have a few questions:

  1. Are uninitialized reads OK in assembly (x84_64)?
  2. Is this the syndrome of a failed optimization (reversing ||) which could trigger in non-benign circumstances?

For now, I am leaning toward annotating the few functions with optimize(1) as a work-around to prevent optimizations from kicking in. Fortunately, the identified functions are not performance critical.


Environment:

  • compiler: gcc 7.3
  • compile flags: -std=c++17 -g -Wall -Werror -O3 -flto (+ appropriate includes)
  • link flags: -O3 -flto (+ appropriate libraries)

Note: can appear with -O2 instead of -O3, but never without -flto.


Fun facts

In the full code, this pattern appears 32 times in the function outlined above, for various payloads : std::uint8_t, std::uint32_t, std::uint64_t and even a struct { std::int64_t; std::int8_t; }.

It only appears in a few big operator== comparing types with ~40 data members, not in smaller ones. And it does not appear for the std::optional<std::string_view> even in those specific functions (which call into std::char_traits for the comparison).

Finally, infuriatingly, isolating the function in question in its own binary makes the "problem" disappear. The mythical MCVE is proving elusive.

解决方案

In x86 asm, the worst that happens is that a single register has an unknown value (or you don't know which of two possible values it has, old or new, in case of possible memory-ordering). But if your code doesn't depend on that register value, you're fine, unlike in C++. C++ UB means your whole program is in theory completely hosed after one signed-integer overflow, and even before that along code-paths that the compiler can see will lead to UB. Nothing like that ever happens in asm, at least not in unprivileged user-space code.

(There might be a few things you can do to basically cause system-wide unpredictable behaviour in the kernel, by setting control registers in weird ways or putting inconsistent things into page tables or descriptors, but that's not going to happen from something like this, even if you were compiling kernel code.)


Some ISAs have "unpredictable behaviour", like early ARM if you use the same register for multiple operands of a multiply, the behaviour is unpredictable. IDK if this allows breaking the pipeline and corrupting other registers, or if it's restricted to just an unexpected multiply result. The latter would be my guess.

Or MIPS, if you put a branch in the branch-delay slot, the behaviour is unpredictable. (Handling exceptions is messy because of branch-delay slots...). But presumably there are still limits and you can't crash the machine or break other processes (in a multi-user system like Unix, it would be bad if an unprivileged user-space process could break anything for other users).

Very early MIPS also had load-delay slots, and multiply delay slots: you couldn't use the result of a load in the next instruction. Presumably you might get the old value of the register if you read it too early, or maybe just garbage. MIPS = Minimally Interlocked Pipeline Stages; they wanted to offload the stalling to software, but it turned out that adding a NOP when the compiler couldn't find anything useful to do next bloated binaries and led to slower overall code vs. having the hardware stall when necessary. But we're stuck with branch-delay slots because removing them would change the ISA, unlike relaxing a restriction on something early software didn't do.

这篇关于有趣的汇编,用于比较std :: primitive类型的可选的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆