优化阵列压实 [英] Optimizing Array Compaction

查看：206 发布时间：2015/11/30 16:15:55 algorithm matlab sse simd

本文介绍了优化阵列压实的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

比方说，我有一个数组 K = [1 2 0 0 5 4 0]

Let's say I have an array k = [1 2 0 0 5 4 0]

我可以计算口罩如下 M = K＆GT; 0 = [1 1 0 0 1 1 0]

I can compute a mask as follows m = k > 0 = [1 1 0 0 1 1 0]

仅使用掩模米和下面的操作

Using only the mask m and the following operations

切换左/右
和/或
添加/减/乘

我可以压缩ķ成以下 [1 2 5 4]

I can compact k into the following [1 2 5 4]

下面是我现在怎么办呢（MATLAB伪code）：

Here's how I currently do it (MATLAB pseudocode):

function out = compact( in )
    d = in
    for i = 1:size(in, 2) %do (# of items in in) passes
        m = d > 0
        %shift left, pad w/ 0 on right
        ml = [m(2:end) 0] % shift
        dl = [d(2:end) 0] % shift

        %if the data originally has a gap, fill it in w/ the 
        %left shifted one
        use = (m == 0) & (ml == 1) %2 comparison  

        d = use .* dl + ~use .* d

        %zero out elements that have been moved to the left
        use_r = [0 use(1:end-1)]
        d = d .* ~use_r
    end

    out = d(1 : size(find(in > 0), 2)) %truncate the end
end

直觉

每个迭代，我们左移的面具和比较的面具。我们建立一个索引有左移的数据，如果我们发现这种转变，一个索引原本无效后（面膜[I] = 0）现在是有效的（屏蔽[我] = 1）。

Each iteration, we shift the mask left and compare the mask. We set a index to have the left shifted data if we find that after this shift, an index that was originally void(mask[i] = 0) is now valid(mask[i] = 1).

问题

上面的算法有O（N *（3 SHIFT + 2 +比较和+加+ 3乘））。有没有一种方法，以提高其效率？

The above algorithm has O(N * (3 shift + 2 comparison + AND + add + 3 multiplies)). Is there a way to improve its efficiency?

推荐答案

没有太多的优化，在原伪code。我看到一些小的改进在这里：

There is no much to optimize in the original pseudo-code. I see several small improvements here:

环可以执行一次迭代更少（即尺寸-1），
如果使用是零，你可能早破环，
使用=（M == 0）及（毫升== 1）可能会被简化为使用=〜M＆放大器;毫升，
如果〜算作独立的操作，这将是更好地使用反码的形式：使用= M | 〜毫升， D =〜使用* DL +使用* D ， use_r = [1使用（1：最终1）] ， D = D * use_r



loop may perform one iteration less (i.e. size-1),
if 'use' is zero, you may break the loop early,
use = (m == 0) & (ml == 1) probably may be simplified to use = ~m & ml,
if ~ is counted as separate operation, it would be better to use the inverted form : use = m | ~ml, d = ~use .* dl + use .* d, use_r = [1 use(1:end-1)], d = d .*use_r

但是，有可能去发明更好的算法。与算法的选择依赖于CPU资源使用的：
But it is possible to invent better algorithms. And the choice of algorithm depends on CPU resources used:

负载存储单元，即直接申请算法来记忆单词。没有什么可以在这里完成，直到芯片制造商加入高度平行散点图指令的指令集。
在SSE寄存器，即工作在整个16个字节的寄存器算法。如拟伪code算法不能帮助这里，因为我们已经有了这使工作更好的各种随机/置换指令。使用不同的比较与PMOVMSKB指令，通过4位分组的结果，并在开关/箱运用各种洗牌指令（如由去年codeR）是我们能做到的最好的。
 SSE / AVX注册到最新的指令集允许一个更好的方法。我们可以直接使用PMOVMSKB的结果，改变控制寄存器的是这样PSHUFB。
在整数寄存器，即GPR登记或对SSE / AVX寄存器数DWORD / QWORD部分（它允许执行几个独立的制件）同时工作。所提出的伪code应用于整数寄存器允许任何长度（从2到20位）的紧凑二进制子集。这里是我的算法，这很可能会表现得更好。

 C ++，64位，集宽度= 8：
C++, 64 bit, subset width = 8:
typedef unsigned long long ull;
const ull h = 0x8080808080808080;
const ull l = 0x0101010101010101;
const ull end = 0xffffffffffffffff;

// uncompacted bytes
ull x = 0x0100802300887700;

// set hi bit for zero bytes (see D.Knuth, volume 4)
ull m = h & ~(x | ((x|h) - l));

// bitmask for nonzero bytes
m = ~(m | (m - (m>>7)));

// tail zero bytes need no special treatment
m |= (m - 1);

while (m != end)
{
  ull tailm = m ^ (m + 1); // bytes to be processed
  ull tailx = x & tailm; // get the bytes
  tailm |= (tailm << 8); // shift 1 byte at a time
  m |= tailm; // all processed bytes are masked
  x = (x ^ tailx) | (tailx << 8); // actual byte shift
}


                        这篇关于优化阵列压实的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

优化阵列压实 [英] Optimizing Array Compaction

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

优化阵列压实 [英] Optimizing Array Compaction

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

登录关闭