用另一个替换一个字节 [英] Substitute a byte with another one

查看:162
本文介绍了用另一个替换一个字节的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于这个看似简单的问题,我很难创建代码.

给出一个压缩的8位整数,用一个字节替换另一个字节(如果存在).

例如,我想将0x06替换为0x01,因此我可以使用res作为输入来查找0x06:

// Bytes to be manipulated
res = _mm_set_epi8(0x00, 0x03, 0x02, 0x06, 0x0F, 0x02, 0x02, 0x06, 0x0A, 0x03, 0x02, 0x06, 0x00, 0x00, 0x02, 0x06);

// Target value and substitution
val = _mm_set1_epi8(0x06);
sub = _mm_set1_epi8(0x01);

// Find the target
sse = _mm_cmpeq_epi8(res, val);

// Isolate target
sse = _mm_and_si128(res, sse);

// Isolate remaining bytes
adj = _mm_andnot_si128(sse, res);

现在我不知道如何进行这两个部分,我需要删除目标并将其替换为替换的字节.

我在这里缺少什么SIMD指令?

与其他问题一样,我仅限于AVX,我没有更好的处理器.

解决方案

您本质上需要做的是将要替换的所有字节(输入的字节)设置为零.然后将替换的所有其他字节设置为零,然后对结果进行或"运算.您已经从_mm_cmpeq_epi8中获得了用于执行此操作的蒙版.总的来说,这可以像这样完成:

__m128i mask = _mm_cmpeq_epi8(inp, val);
return _mm_or_si128(_mm_and_si128(mask, sub), _mm_andnot_si128(mask, inp));

由于and/andnot/or的最后一种组合非常普遍,因此SSE4.1引入了一条指令(基本上)将它们组合为一个指令:

__m128i mask = _mm_cmpeq_epi8(inp, val);
return _mm_blendv_epi8(inp, sub, mask);

实际上,clang5.0和更高版本足够聪明,可以通过优化进行编译,从而用第二种替换第一种: 解决方案

What you essentially need to do is to set all bytes (of the input) which you want to substitute to zero. Then set all other bytes of the substitution to zero and OR the results. You already got a mask to do that from the _mm_cmpeq_epi8. Overall, this can be done like this:

__m128i mask = _mm_cmpeq_epi8(inp, val);
return _mm_or_si128(_mm_and_si128(mask, sub), _mm_andnot_si128(mask, inp));

Since the last combination of and/andnot/or is very common, SSE4.1 introduced an instruction which (essentially) combines these into one:

__m128i mask = _mm_cmpeq_epi8(inp, val);
return _mm_blendv_epi8(inp, sub, mask);

In fact, clang5.0 and later is smart enough to replace the first variant by the second, when compiled with optimization: https://godbolt.org/z/P-tcik


N.B.: If the substitution value is in fact 0x01 you can exploit the fact that the mask (the result of the comparison) is 0x00 or 0xff (which is -0x01), i.e., you can zero out the values you want to substitute and then subtract the mask:

__m128i val = _mm_set1_epi8(0x06);
__m128i mask = _mm_cmpeq_epi8(inp, val);
return _mm_sub_epi8(_mm_andnot_si128(mask, inp), mask);

This can save either loading the 0x01 vector from memory or wasting a register for it. And depending on your architecture it may have a slightly better throughput.

这篇关于用另一个替换一个字节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆