检查快捷方式,如果字符数组为零 [英] fast way to check if an array of chars is zero

查看:117
本文介绍了检查快捷方式,如果字符数组为零的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有个字节的数组,在存储器中。什么是看是否在阵列中的所有字节都是零的最快方法?

I have an array of bytes, in memory. What's the fastest way to see if all the bytes in the array are zero?

推荐答案

目前,短期使用 SIMD 扩展(如x86处理器 SSE ),你还不如迭代这个数组并每个值比较为0。

Nowadays, short of using SIMD extensions (such as SSE on x86 processors), you might as well iterate over the array and compare each value to 0.

<强>在遥远的过去下,执行用于该阵列(除了回路分支本身)中的每个元素的比较和条件分支将被认为是昂贵的,这取决于如何经常(或早),你可以期望一个非零元素出现在阵中,你可能会选到完全的离不开循环内部条件语句,仅仅使用按位或检测到任何设置位,并推迟的实际检查直到循环完成后:

In the distant past, performing a comparison and conditional branch for each element in the array (in addition to the loop branch itself) would have been deemed expensive and, depending on how often (or early) you could expect a non-zero element to appear in the array, you might have elected to completely do without conditionals inside the loop, using solely bitwise-or to detect any set bits and deferring the actual check until after the loop completes:

int sum = 0;
for (i = 0; i < ARRAY_SIZE; ++i) {
  sum |= array[i];
}
if (sum != 0) {
  printf("At least one array element is non-zero\n");
}

然而,今天的流水线超标量处理器设计完整的分支prediction ,所有非SSE方法是一个循环内virtualy没有什么区别。如果有的话,每一个元素比较为零,并打破循环的早期(只要遇到第一个非零元素)可能是,从长远来看,比总和更高效| =阵列[我] 办法(它总是穿越整个数组),除非,那就是,你希望你的阵列几乎总是全部由零的(在这种情况下,使总和| =阵列[我] 办法通过gcc的 -funroll-循环真正的网点可以给你更好的数字 - 请参阅下面的号码的速龙处理器,的结果可能会与处理器的型号和制造商会有所不同。)

However, with today's pipelined super-scalar processor designs complete with branch prediction, all non-SSE approaches are virtualy indistinguishable within a loop. If anything, comparing each element to zero and breaking out of the loop early (as soon as the first non-zero element is encountered) could be, in the long run, more efficient than the sum |= array[i] approach (which always traverses the entire array) unless, that is, you expect your array to be almost always made up exclusively of zeroes (in which case making the sum |= array[i] approach truly branchless by using GCC's -funroll-loops could give you the better numbers -- see the numbers below for an Athlon processor, results may vary with processor model and manufacturer.)

#include <stdio.h>

int a[1024*1024];

/* Methods 1 & 2 are equivalent on x86 */  

int main() {
  int i, j, n;

# if defined METHOD3
  int x;
# endif

  for (i = 0; i < 100; ++i) {
#   if defined METHOD3
    x = 0;
#   endif
    for (j = 0, n = 0; j < sizeof(a)/sizeof(a[0]); ++j) {
#     if defined METHOD1
      if (a[j] != 0) { n = 1; }
#     elif defined METHOD2
      n |= (a[j] != 0);
#     elif defined METHOD3
      x |= a[j];
#     endif
    }
#   if defined METHOD3
    n = (x != 0);
#   endif

    printf("%d\n", n);
  }
}

$ uname -mp
i686 athlon
$ gcc -g -O3 -DMETHOD1 test.c
$ time ./a.out
real    0m0.376s
user    0m0.373s
sys     0m0.003s
$ gcc -g -O3 -DMETHOD2 test.c
$ time ./a.out
real    0m0.377s
user    0m0.372s
sys     0m0.003s
$ gcc -g -O3 -DMETHOD3 test.c
$ time ./a.out
real    0m0.376s
user    0m0.373s
sys     0m0.003s

$ gcc -g -O3 -DMETHOD1 -funroll-loops test.c
$ time ./a.out
real    0m0.351s
user    0m0.348s
sys     0m0.003s
$ gcc -g -O3 -DMETHOD2 -funroll-loops test.c
$ time ./a.out
real    0m0.343s
user    0m0.340s
sys     0m0.003s
$ gcc -g -O3 -DMETHOD3 -funroll-loops test.c
$ time ./a.out
real    0m0.209s
user    0m0.206s
sys     0m0.003s

这篇关于检查快捷方式,如果字符数组为零的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆