最快海明距离C实现的追捕 [英] The hunt for the fastest Hamming Distance C implementation

查看:332
本文介绍了最快海明距离C实现的追捕的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想找到长度相等的两个字符串多少个不同的人物都有。我发现,异或算法被认为是最快的,但他们返回位pssed距离前$ P $。我想在人物pssed结果前$ P $。假设宠物和坑有距离1 EX pressed中的字符,但e和我可能有两个不同的位,所以异或回报2。

I want to find how many different characters two strings of equal length have. I have found that xoring algorithms are considered to be the fastest, but they return distance expressed in bits. I want the results expressed in characters. Suppose that "pet" and "pit" have distance 1 expressed in characters but 'e' and 'i' might have two different bits, so xoring returns 2.

我写的功能是:

// na = length of both strings
unsigned int HammingDistance(const char* a, unsigned int na, const char* b) {

    unsigned int num_mismatches = 0;
    while (na) {
        if (*a != *b)
            ++num_mismatches;

        --na;
        ++a;
        ++b;
    }

    return num_mismatches;
}

难道成为任何更快吗?
也许使用一些较低级别的命令或执行不同的算法?

Could it become any faster? Maybe using some lower level commands or implementing a different algorithm?

系统:GCC 4.7.2英特尔至强X5650

感谢您

推荐答案

如果该字符串以零填充永远是32个字节,其地址为16的整数倍,你可以做这样的事情:(code既不测试或异形)

If the strings are padded with zero to always be 32 bytes and their addresses are 16-aligned, you could do something like this: (code neither tested nor profiled)

movdqa xmm0, [a]
movdqa xmm1, [a + 16]
pcmpeqb xmm0, [b]
pcmpeqb xmm1, [b + 16]
pxor xmm2, xmm2
psadbw xmm0, xmm2
psadbw xmm1, xmm2
pextrw ax, xmm0, 0
pextrw dx, xmm1, 0
add ax, dx
movsx eax, ax
neg eax

但如果串通常是微小的,它做了很多不必要的工作,它可能没有任何快。它应该是更快,如果字符串通常是(几乎)虽然32个字节。

But if the strings are usually tiny, it does a lot of unnecessary work and it may not be any faster. It should be faster if the strings are usually (nearly) 32 bytes though.

编辑:之前我看到你更新的评论我写了这个答案 - 如果字符串通常是微小的,这可能不是很好。一个16字节的版本可能(或许)是有用的,虽然(有条件地运行在第二次迭代,该分公司应良好predicted,因为它会被很少考虑)。但是,这样短字符串,正常code是很难被击败。

edit: I wrote this answer before I saw your updated comment - if the strings are usually that tiny, this is probably not very good. A 16-byte version could (maybe) be useful though (run the second iteration conditionally, the branch for that should be well-predicted because it'll be rarely taken). But with such short strings, the normal code is hard to beat.

movdqa xmm0, [a]
pxor xmm1, xmm1
pcmpeqb xmm0, [b]
psadbw xmm0, xmm1
pextrw ax, xmm0, 0
movsx eax, ax
neg eax

这篇关于最快海明距离C实现的追捕的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆