汉明重量(1的个数)将C与组件混合 [英] Hamming weight ( number of 1 in a number) mixing C with assembly

查看:102
本文介绍了汉明重量(1的个数)将C与组件混合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试计算数组中数字1的个数.

I'm trying to count how many number 1, are in the numbers of an array.

首先,我在C lenguaje中有一个代码(正常工作):

First I have a code in C lenguaje(work ok):

int popcount2(int* array, int len){
    int i;
    unsigned x;
    int result=0;
    for (i=0; i<len; i++){
        x = array[i];
        do{
           result+= x & 0x1;
           x>>= 1;
       } while(x);
    }
return result;
}

现在,我需要使用3-6行代码将do-while循环转换为Assembly.我已经写了一些代码,但是结果不正确.(我是汇编世界中的新手)

Now I need to translate the do-while loop into Assembly using 3-6 lines of code. I have write some code but the result is not correct.( I'm new in the assembly world )

int popcount3(int* array, int len){
int  i;
unsigned x;
int result=0;   
for (i=0; i<len; i++){
    x = array[i];
    asm(
    "ini3:               \n"
        "adc $0,%[r]     \n"
        "shr %[x]        \n"
        "jnz ini3        \n"

        : [r]"+r" (result)
        : [x] "r" (x)       );
  }
}

我正在将GCC(在Linux上)与Intel处理器配合使用.

I'm using GCC ( on linux) with an Intel processor.

推荐答案

您最好的想法是将内置的popcount函数用作

The nicest think you can do is using built in popcount function as suggested by Paul R, but since you need to write it in assembly, this worked for me:

asm (
"start:                  \n"
        "and %0, %1      \n"
        "jz end          \n"
        "shr $0, %1      \n"
        "jnc start       \n"
        "inc %1          \n"
        "jmp start       \n"
"end:                    \n"
        : "+g" (result),
          "+r" (x)
        :
        : "cc"
);

在前两行中,您只需检查x的内容(如果它为零Jump Zero,则结束).比您将x右移一位,然后:

At first two lines you just check the contents of x (and go to end if it's zero Jump Zero). Than you shift x one bit to right and:

在移位操作结束时, CF标志包含最后一位移出了目标操作数. *

如果未设置CF,则只需开始(Jump Not Carry),否则递增结果,然后开始.

If there's no CF set, just go to start (Jump Not Carry) else increment result and then go to start.

关于组装的美好想法是,您可以通过多种方式来做事...

And the beautiful think about assembly is that you can do things in so many ways...

asm (
"start:                  \n"
        "shr $1, %1      \n"
        "jnc loop_cond   \n"
        "inc %0          \n"
        "and %1, %1      \n"
"loop_cond:              \n"
        "jnz start       \n"

        : "+g" (result),
          "+r" (x)
        :
        : "cc"
);

如果没有CF,请再次使用SHift Right指令,进入循环条件.

Here you again use SHift Right instruction, if no CF is present just go to loop condition.

否则,再次增加结果并调用二进制文件AND(INC 确实修改ZF).

Otherwise again increment result and call binary AND (INC does modify ZF).

我很好奇如何在3条指令中做到这一点(我想如果不可能的话,您的老师不会给您3的下限) 我意识到x86也具有 LOOP指令:

每次执行LOOP指令时,计数寄存器都会递减,然后检查是否为0.如果计数为0,则循环终止,程序继续执行LOOP指令之后的指令.如果计数不为零,则执行到目标(目标)操作数的近跳转,这大概是循环开始时的指令. *

您可以使用 GCC输入约束:

c-c寄存器.

asm (
"start:              \n"
    "shr $1, %1      \n"
    "adc $0, %0      \n"
    "loop start      \n"

    : "+g" (result)
    : "r" (x),
      "c" (8)             // Assuming 8b type (char)
);

只需确保可以编译为正确的汇编即可:

Just to make sure it compiles to proper assembly:

0x000000000040051f <+25>:   mov    $0x8,%ecx
0x0000000000400524 <+30>:   mov    -0x8(%rbp),%eax
0x0000000000400527 <+33>:   shr    %edx
0x0000000000400529 <+35>:   adc    $0x0,%eax
0x000000000040052c <+38>:   loop   0x400527 <main+33>

我认为第一个应该具有更好的性能,特别是如果仅设置1位,则此方法始终会执行k*8迭代.

我知道您必须使用循环,但这只是为了好玩...使用 SSE4扩展名您可以通过一条指令POPCNT来做到这一点:

I know you have to use a loop, but just for fun... With SSE4 extension you could do this by just one instruction POPCNT:

此指令计算第二个操作数(源)中设置为1的位数,并返回第一个操作数(目标寄存器)中的计数. *

认为(我的笔记本电脑上有一个很旧的CPU,所以我无法为您测试一下),您应该可以只用一个简单的指令就可以做到这一点:

I think (I have a quite old CPU on my notebook, so I can't test this for you) you should be able to do this with just one simple instruction:

asm (   
    "POPCNT %1, %0   \n"
    : "=r" (result)
    : "mr" (x)
    : "cc"                                                                                                                                       
);

(如果您尝试这样做并且确实拥有SSE4扩展名,请告诉我它是否有效)

我已经测量了进行1亿次弹出式计数所需的时间,并将我的第一种和第二种方法与 David Wohlferd's 进行比较. [原始数据]

I've measured times required to do 100,000,000 popcounts comparing my first and second methods with David Wohlferd's. [Raw data]

+--------------+------------+------------+------------+
|              | 0x00000000 | 0x80000001 | 0xffffffff |
+--------------+------------+------------+------------+
| 1st solution |  0.543     |  5.040     |  3.833     |
| LOOP         | 11.530     | 11.523     | 11.523     |
| Davids       |  0.750     |  4.893     |  4.890     |
+--------------+------------+------------+------------+

如果有人可以将这3个与SSE4的POPCNT指令进行比较,我会很高兴的.

If anyone can compare these 3 with SSE4's POPCNT instruction I'd be glad.

这篇关于汉明重量(1的个数)将C与组件混合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆