-!(条件)是从布尔值(掩码布尔值)获取全位向量的正确方法吗? [英] Is -!(condition) a correct way to obtain a full-bitvector from a boolean (mask-boolean)?

查看:134
本文介绍了-!(条件)是从布尔值(掩码布尔值)获取全位向量的正确方法吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在从高性能代码中删除条件分支时,将真正的布尔值转换为unsigned long i = -1以设置所有位可能会很有用.

In removing conditional branches from high-performance code, converting a true boolean to unsigned long i = -1 to set all bits can be useful.

我想出了一种方法,该方法从int b(或bool b)的输入中获取10值的整数掩码布尔值:

I came up with a way to obtain this integer-mask-boolean from input of a int b (or bool b) taking values either 1 or 0:

unsigned long boolean_mask = -(!b);

要获得相反的值:

unsigned long boolean_mask = -b;

以前有人看过这种结构吗?我在做某事吗?当将-1的int值(我假设-b-(!b)确实会产生)提升为更大的无符号int类型时,是否可以保证设置所有位?

Has anybody seen this construction before? Am I on to something? When a int value of -1 (which I assume -b or -(!b) does produce) is promoted to a bigger unsigned int type is it guaranteed to set all the bits?

这里是上下文:

uint64_t ffz_flipped = ~i&~(~i-1); // least sig bit unset
// only set our least unset bit if we are not pow2-1
i |= (ffz_flipped < i) ? ffz_flipped : 0;

下一次问类似问题之前,我将检查生成的asm.听起来编译器很可能不会在这里增加分支的负担.

I will inspect the generated asm before asking questions like this next time. Sounds very likely the compiler will not burden the cpu with a branch here.

推荐答案

您应该问自己的问题是:如果您写:

The question you should be asking yourself is this: If you write:

int it_was_true = b > c;

然后it_was_true将为1或0.但是那1是哪里来的?

机器的指令集不包含以下形式的指令:

The machine's instruction set doesn't contain an instruction of the form:

Compare R1 with R2 and store either 1 or 0 in R3

,或者实际上是类似的东西. (我在此答案的结尾处在SSE上加了一条注释,以说明前面的陈述不太正确.)该机器具有一个内部条件寄存器,该寄存器由几个条件位和比较指令组成-还有许多其他条件算术运算-使这些条件位以特定方式修改.随后,您可以根据某些条件位或条件负载,有时还可以进行其他条件操作,来进行条件分支.

or, indeed, anything like that. (I put a note on SSE at the end of this answer, illustrating that the former statement is not quite true.) The machine has an internal condition register, consisting of several condition bits, and the compare instruction -- and a number of other arithmetic operations -- cause those condition bits to be modified in specific ways. Subsequently, you can do a conditional branch, based on some condition bits, or a conditional load, and sometimes other conditional operations.

因此,实际上,将1存储在变量中的效率可能比直接执行某些条件运算要低得多.可能是(但可能不是),因为编译器(或至少是编写编译器的人)可能比您聪明,并且它可能只记得应该将1放入it_was_true中,所以当您实际上可以检查该值,编译器可以发出适当的分支或任何其他内容.

So actually, it could be a lot less efficient to store that 1 in a variable than it would have been to have directly done some conditional operation. Could have been, but maybe not, because the compiler (or at least, the guys who wrote the compiler) may well be cleverer than you, and it might just remember that it should have put a 1 into it_was_true so that when you actually get around to checking the value, the compiler can emit an appropriate branch or whatever.

因此,谈到聪明的编译器,您应该仔细看一下由以下代码产生的汇编代码:

So, speaking of clever compilers, you should take a careful look at the assembly code produced by:

uint64_t ffz_flipped = ~i&~(~i-1);

看看该表达式,我可以算出五种运算:三种按位取反,一种按位合取(and)和一种减法.但是您不会在程序集输出中找到五个操作(至少,如果您使用gcc -O3).您会发现三个.

Looking at that expression, I can count five operations: three bitwise negations, one bitwise conjunction (and), and one subtract. But you won't find five operations in the assembly output (at least, if you use gcc -O3). You'll find three.

在查看汇编输出之前,让我们做一些基本的代数.这是最重要的身份:

Before we look at the assembly output, let's do some basic algebra. Here's the most important identity:

-X == ~X + 1

您知道这是为什么吗? -X以2的补码表示2n - X,这是另一种说法,其中n是单词中的位数.实际上,这就是为什么它被称为"2的补码"的原因. ~X呢?好吧,我们可以认为这是从2的相应幂中减去X中的每个位的结果.例如,如果我们单词中有四个位,并且X0101(即5或2 2 + 2 0 ),则~X1010,我们可以将其视为23×(1-0) + 22×(1-1) + 21×(1-0) + 20×(1-1),与1111 − 0101完全相同.或者,换句话说:

Can you see why that's true? -X, in 2's complement, is just another way of saying 2n - X, where n is the number of bits in the word. In fact, that's why it's called "2's complement". What about ~X? Well, we can think of that as the result of subtracting every bit in X from the corresponding power of 2. For example, if we have four bits in our word, and X is 0101 (which is 5, or 22 + 20), then ~X is 1010 which we can think of as 23×(1-0) + 22×(1-1) + 21×(1-0) + 20×(1-1), which is exactly the same as 1111 − 0101. Or, in other words:

 −X == 2n − X
  ~X == (2n−1) − X 意思就是
  ~X == (−X) − 1

 −X == 2n − X
  ~X == (2n−1) − X which means that
  ~X == (−X) − 1

记住我们有

ffz_flipped = ~i&~(~i-1);

但是我们现在知道可以将〜(〜i− 1)更改为minus操作:

But we now know that we can change ~(~i−1) into minus operations:

~(~i−1)
== −(~i−1) − 1
== −(−i - 1 - 1) − 1
== (i + 2) - 1
== i + 1

~(~i−1)
== −(~i−1) − 1
== −(−i - 1 - 1) − 1
== (i + 2) - 1
== i + 1

那太酷了!我们本来可以写:

How cool is that! We could have just written:

ffz_flipped = ~i & (i + 1);

这只是三个操作,而不是五个.

which is only three operations, instead of five.

现在,我不知道您是否遵循该要求,花了我一些时间才能正确设置,但现在让我们来看一下gcc的输出:

Now, I don't know if you followed that, and it took me a bit of time to get it right, but now let's look at gcc's output:

    leaq    1(%rdi), %rdx     # rdx = rdi + 1 
    movq    %rdi, %rax        # rax = rdi                                        
    notq    %rax              # rax = ~rax                             
    andq    %rax, %rdx        # rdx &= rax

所以gcc自己去解决了所有问题.

So gcc just went and figured all that out on its own.

关于SSE的承诺说明:事实证明SSE可以进行并行比较,甚至可以在两个16字节寄存器之间同时进行16字节比较.条件寄存器并不是为此而设计的,无论如何,没有人愿意在不需要时进行分支.因此,CPU实际上确实将一个SSE寄存器(一个16字节的向量,或8个字"或4个双字"的向量,无论该操作说什么)更改为一个布尔指示符的向量.但是它不使用1表示真实.而是使用所有1的掩码.为什么?因为程序员可能要用比较结果做的下一件事是使用它来掩盖值,我认为这正是您打算使用-(!B)技巧进行的操作,除了并行流版本外

The promised note about SSE: It turns out that SSE can do parallel comparisons, even to the point of doing 16 byte-wise comparisons at a time between two 16-byte registers. Condition registers weren't designed for that, and anyway no-one wants to branch when they don't have to. So the CPU does actually change one of the SSE registers (a vector of 16 bytes, or 8 "words" or 4 "double words", whatever the operation says) into a vector of boolean indicators. But it doesn't use 1 for true; instead, it uses a mask of all 1s. Why? Because it's likely that the next thing the programmer is going to do with that comparison result is use it to mask out values, which I think is just exactly what you were planning to do with your -(!B) trick, except in the parallel streaming version.

因此,请放心,它已被覆盖.

So, rest assured, it's been covered.

这篇关于-!(条件)是从布尔值(掩码布尔值)获取全位向量的正确方法吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆