检查字符数组是否为零的快速方法 [英] fast way to check if an array of chars is zero

查看:40
本文介绍了检查字符数组是否为零的快速方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在内存中有一个字节数组.查看数组中的所有字节是否为零的最快方法是什么?

I have an array of bytes, in memory. What's the fastest way to see if all the bytes in the array are zero?

推荐答案

如今,缺乏使用 SIMD 扩展(例如 x86 处理器上的 SSE),您可能以及遍历数组并将每个值与0进行比较.

Nowadays, short of using SIMD extensions (such as SSE on x86 processors), you might as well iterate over the array and compare each value to 0.

在遥远的过去,为数组中的每个元素(除了循环分支本身)执行比较和条件分支会被认为是昂贵的,并且取决于频率(或早) 您可能希望数组中出现一个非零元素,您可能已经选择完全在循环内没有条件,仅使用按位或检测任何设置位并推迟实际检查直到循环完成:

In the distant past, performing a comparison and conditional branch for each element in the array (in addition to the loop branch itself) would have been deemed expensive and, depending on how often (or early) you could expect a non-zero element to appear in the array, you might have elected to completely do without conditionals inside the loop, using solely bitwise-or to detect any set bits and deferring the actual check until after the loop completes:

int sum = 0;
for (i = 0; i < ARRAY_SIZE; ++i) {
  sum |= array[i];
}
if (sum != 0) {
  printf("At least one array element is non-zero
");
}

然而,随着当今的流水线超标量处理器设计完成了分支预测,所有非SSE 方法在循环中几乎无法区分.如果有的话,从长远来看,将每个元素与零进行比较并尽早跳出循环(一旦遇到第一个非零元素)可能比 sum |= array[i] 更有效 方法(它总是遍历整个数组),除非,也就是说,您希望数组几乎总是完全由零组成(在这种情况下,使 sum |= array[i] 使用 GCC 的 -funroll-loops 实现真正无分支的方法可以为您提供更好的数字——请参阅下面的 Athlon 处理器数字,结果可能因处理器型号和制造商而异.)

However, with today's pipelined super-scalar processor designs complete with branch prediction, all non-SSE approaches are virtualy indistinguishable within a loop. If anything, comparing each element to zero and breaking out of the loop early (as soon as the first non-zero element is encountered) could be, in the long run, more efficient than the sum |= array[i] approach (which always traverses the entire array) unless, that is, you expect your array to be almost always made up exclusively of zeroes (in which case making the sum |= array[i] approach truly branchless by using GCC's -funroll-loops could give you the better numbers -- see the numbers below for an Athlon processor, results may vary with processor model and manufacturer.)

#include <stdio.h>

int a[1024*1024];

/* Methods 1 & 2 are equivalent on x86 */  

int main() {
  int i, j, n;

# if defined METHOD3
  int x;
# endif

  for (i = 0; i < 100; ++i) {
#   if defined METHOD3
    x = 0;
#   endif
    for (j = 0, n = 0; j < sizeof(a)/sizeof(a[0]); ++j) {
#     if defined METHOD1
      if (a[j] != 0) { n = 1; }
#     elif defined METHOD2
      n |= (a[j] != 0);
#     elif defined METHOD3
      x |= a[j];
#     endif
    }
#   if defined METHOD3
    n = (x != 0);
#   endif

    printf("%d
", n);
  }
}

$ uname -mp
i686 athlon
$ gcc -g -O3 -DMETHOD1 test.c
$ time ./a.out
real    0m0.376s
user    0m0.373s
sys     0m0.003s
$ gcc -g -O3 -DMETHOD2 test.c
$ time ./a.out
real    0m0.377s
user    0m0.372s
sys     0m0.003s
$ gcc -g -O3 -DMETHOD3 test.c
$ time ./a.out
real    0m0.376s
user    0m0.373s
sys     0m0.003s

$ gcc -g -O3 -DMETHOD1 -funroll-loops test.c
$ time ./a.out
real    0m0.351s
user    0m0.348s
sys     0m0.003s
$ gcc -g -O3 -DMETHOD2 -funroll-loops test.c
$ time ./a.out
real    0m0.343s
user    0m0.340s
sys     0m0.003s
$ gcc -g -O3 -DMETHOD3 -funroll-loops test.c
$ time ./a.out
real    0m0.209s
user    0m0.206s
sys     0m0.003s

这篇关于检查字符数组是否为零的快速方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆