_mm_extract_epi8(...)以非文字整数作为参数的内在函数 [英] _mm_extract_epi8(...) intrinsic that takes a non-literal integer as argument

查看:192
本文介绍了_mm_extract_epi8(...)以非文字整数作为参数的内在函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近一直在使用SSE固有的int _mm_extract_epi8 (__m128i src, const int ndx),根据参考文献从索引选择的压缩整数数组元素中提取整数字节".这正是我想要的.

I've lately been using the SSE intrinsic int _mm_extract_epi8 (__m128i src, const int ndx) that, according to the reference "extracts an integer byte from a packed integer array element selected by index". This is exactly what I want.

但是,我通过_m128i上的_mm_cmpestri确定索引,该索引执行具有显式长度的字符串数据的打包比较并生成索引.该索引的范围是0..16,其中0..15表示有效索引,而16表示未找到索引.现在要提取索引位置处的整数,我想到了执行以下操作:

However, I determine the index via a _mm_cmpestri on a _m128i that performs a packed comparison of string data with explicit lengths and generates the index. The range of this index is 0..16 where 0..15 represents a valid index and 16 means that no index was found. Now to extract the integer at the index position I thought of doing the following:

const int index = _mm_cmpestri(...);
if (index >= 0 && index < 16) {
  int intAtIndex = _mm_extract_epi8(..., index);
}

这给我们留下了gcc(-O0)编译器错误:

This leaves us with the gcc (-O0) compiler error:

错误:选择器必须是0..15范围内的整数常量

error: selector must be an integer constant in the range 0..15

解决此问题的一种令人讨厌的方法是在索引上使用switch,并在范围0..15中对每个索引使用_mm_extract_epi8调用.我的问题是,是否有我看不到的更好/更好的方法.

A nasty way around this issue is to have a switch on the index and a _mm_extract_epi8 call for each index in range 0..15. My question is if there is a better/nicer way that I don't see.

更新:使用-O3优化,没有编译错误;仍然是-O0.

Update: with -O3 optimization, there is no compilation error; still with -O0 though.

推荐答案

只是总结并结束问题.

我们讨论了3个从_m128i sse提取[0..15]中索引i的字节的选项,在编译时我无法将其还原为文字:

We discussed 3 options to extract a byte at index i in [0..15] from a _m128i sse where i cannot be reduced to a literal at compile time:

1)开关& _mm_extract_epi8:在i上有一个switch,并且在[0..15]中每个i都有一个表示_mm_extract_epi8(sse,i)的情况;我现在的工作原理是一个编译时文字.

1) Switch & _mm_extract_epi8: have a switch over i and a case for each i in [0..15] that does a _mm_extract_epi8(sse,i); works as i now is a compile-time literal.

2)Union hack:拥有一个union SSE128i { __m128i sse; char[16] array; },将其初始化为SSE128i sse = { _mm_loadu_si128(...) },并使用sse.array[i]访问索引i处的字节.

2) Union hack: have a union SSE128i { __m128i sse; char[16] array; }, initialize it as SSE128i sse = { _mm_loadu_si128(...) } and access the byte at index i with sse.array[i].

3)将第ith个元素随机排列到位置0和_mm_extract_epi8:使用_mm_shuffle_epi8(sse,_mm_set1_epi8(i))将第i个元素随机排列到位置0;用_mm_extract_epi8(sse,0)提取.

3) Shuffle ith element to position 0 and _mm_extract_epi8: use _mm_shuffle_epi8(sse,_mm_set1_epi8(i)) to shuffle the ith element to position 0; extract it with _mm_extract_epi8(sse,0).

评估:我在Intel Sandy Bridge和AMD Bulldozer架构上对这三个选项进行了基准测试.切换选项赢得了少​​量保证金.如果有人有兴趣,我可以发布更多详细的数字和基准设置.

Evaluation: I benchmarked the three options on an Intel Sandy Bridge and a AMD Bulldozer architecture. The switch option won by a small margin. If someone's interested I can post more detailed numbers and the benchmark setup.

更新:评估 基准设置:解析1GB文件的每个字节.对于某些特殊字节,增加一个计数器.使用_mm_cmpistri查找特殊字节的索引;然后使用上述三种方法之一提取"字节,并进行区分大小写的操作,其中计数器增加.使用GCC 4.6和-std=c++0x -O3 -march=native编译代码.

Update: Evaluation Benchmark setup: parse each byte of a 1GB file. For certain special bytes, increase a counter. Use _mm_cmpistri to find the index of a special byte; then "extract" the byte using one of the three methods mentioned and do a case distinction in which the counters are incremented. Code was compiled using GCC 4.6 with -std=c++0x -O3 -march=native.

对于每种方法,基准在Sandy Bridge机器上运行25次.结果(运行时间的平均值和标准偏差,以秒为单位):

For each method, the benchmark was run 25 times on a Sandy Bridge machine. Results (mean and std. dev. of running time in seconds):

切换并提取: 平均值:1071.45 标准偏差:2.72006

Switch and extract: Mean: 1071.45 Standard deviation: 2.72006

联盟黑客: 平均值:1078.61 标准偏差:2.87131

Union hack: Mean: 1078.61 Standard deviation: 2.87131

从位置0抽出并提取: 均值:1079.32 标准偏差:2.69808

Suffle and extract from position 0: Mean: 1079.32 Standard deviation: 2.69808

差异很小.我还没有机会查看生成的asm.看到差异可能会很有趣.目前,我无法发布基准测试的完整代码,因为它包含非公开来源.如果有时间,我将提取这些内容并将其发布.

The differences are marginal. I haven't had a chance to look at the generated asm yet. Might be interesting to see the difference though. For now I can't release the full code of the benchmark as it contains non-public sources. If I have time I'll extract these and post the sources.

这篇关于_mm_extract_epi8(...)以非文字整数作为参数的内在函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆