从SIMD向量中提取设置的字节位置 [英] Extract set bytes position from SIMD vector

查看：320 发布时间：2020/11/30 4:48:48 c++ sse simd intrinsics

本文介绍了从SIMD向量中提取设置的字节位置的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用SIMD指令运行一系列计算.这些指令返回一个16字节的向量，结果为compare，每个字节为0x00或0xff:

I run a bench of computations using SIMD intructions. These instructions return a vector of 16 bytes as result, named compare, with each byte being 0x00 or 0xff :

             0    1    2    3    4    5    6    7       15   16
compare : 0x00 0x00 0x00 0x00 0xff 0x00 0x00 0x00 ... 0xff 0x00

将字节设置为0xff意味着我需要运行函数do_operation(i) ，而我是字节的位置.

Bytes set to 0xff mean I need to run the function do_operation(i) with i being the position of the byte.

例如，上面的compare向量表示，我需要运行以下操作序列:

For instance, the above compare vector mean, I need to run this sequence of operations :

do_operation(4);
do_operation(15);

这是到目前为止我想到的最快的解决方案:

Here is the fastest solution I came up with until now :

for(...) {
        //
        // SIMD computations
        //
        __m128i compare = ... // Result of SIMD computations

        // Extract high and low quadwords for compare vector
        std::uint64_t cmp_low = (_mm_cvtsi128_si64(compare));
        std::uint64_t cmp_high = (_mm_extract_epi64(compare, 1));

        //  Process low quadword 
        if (cmp_low) {
            const std::uint64_t low_possible_positions = 0x0706050403020100;
            const std::uint64_t match_positions = _pext_u64(
                    low_possible_positions, cmp_low);
            const int match_count = _popcnt64(cmp_low) / 8;
            const std::uint8_t* match_pos_array =
                    reinterpret_cast<const std::uint8_t*>(&match_positions);

            for (int i = 0; i < match_count; ++i) {
                do_operation(i);
            }
        }

        // Process high quadword (similarly)
        if (cmp_high) { 

            const std::uint64_t high_possible_positions = 0x0f0e0d0c0b0a0908;
            const std::uint64_t match_positions = _pext_u64(
                    high_possible_positions, cmp_high);
            const int match_count = _popcnt64(cmp_high) / 8;
            const std::uint8_t* match_pos_array =
                    reinterpret_cast<const std::uint8_t*>(&match_positions);

            for(int i = 0; i < match_count; ++i) {
                do_operation(i);
            }
        }
}

我首先提取128位向量的第一和第二64位整数(cmp_low和cmp_high).然后，我使用popcount计算设置为0xff的字节数(设置为1的位数除以8).最后，我使用pext来获取没有零的位置，如下所示:

I start with extracting the first and second 64 bits integers of the 128 bits vector (cmp_low and cmp_high). Then I use popcount to compute the number of bytes set to 0xff (number of bits set to 1 divided by 8). Finally, I use pext to get positions, without zeros, like this :

0x0706050403020100
0x000000ff00ff0000
        |
      PEXT
        |
0x0000000000000402

我想找到一个更快的解决方案，以提取compare向量中设置为0xff的字节的位置.更准确地说，compare向量中的0xff通常仅将 0、1或2个字节设置为0xff ，我想使用此信息来避免出现某些分支.

I would like to find a faster solution to extract the positions of the bytes set to 0xff in the compare vector. More precisely, the are very often only 0, 1 or 2 bytes set to 0xff in the compare vector and I would like to use this information to avoid some branches.

从SIMD向量中提取设置的字节位置 [英] Extract set bytes position from SIMD vector

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

从SIMD向量中提取设置的字节位置 [英] Extract set bytes position from SIMD vector

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭