通过指针运算与 C 中的下标访问数组值 [英] Accessing array values via pointer arithmetic vs. subscripting in C

查看:26
本文介绍了通过指针运算与 C 中的下标访问数组值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在读到,在 C 中,使用指针算术通常比使用下标访问数组更快.即使使用现代(据称是优化的)编译器也是如此吗?

I keep reading that, in C, using pointer arithmetic is generally faster than subscripting for array access. Is this true even with modern (supposedly-optimizing) compilers?

如果是这样,当我开始从学习 C 转向 Objective-C 和 Cocoa 在 Mac 上?

If so, is this still the case as I begin to move away from learning C into Objective-C and Cocoa on Macs?

在 C 和 Objective-C 中,哪种数组访问的首选编码风格是?哪个(由各自语言的专业人士)认为更清晰、更正确"(因为缺乏更好的术语)?

Which is the preferred coding style for array access, in both C and Objective-C? Which is considered (by professionals of their respective languages) more legible, more "correct" (for lack of a better term)?

推荐答案

您需要了解此声明背后的原因.你有没有问过自己为什么它更快?让我们比较一些代码:

You need to understand the reason behind this claim. Have you ever questioned yourself why it is faster? Let's compare some code:

int i;
int a[20];

// Init all values to zero
memset(a, 0, sizeof(a));
for (i = 0; i < 20; i++) {
    printf("Value of %d is %d\n", i, a[i]);
}

它们都为零,真是个惊喜:-P 问题是,a[i] 实际上在低级机器代码中是什么意思?这意味着

They are all zero, what a surprise :-P The question is, what means a[i] actually in low level machine code? It means

  1. a在内存中的地址.

i 乘以 a 的单个项目的大小添加到该地址(int 通常为四个字节).

Add i times the size of a single item of a to that address (int usually is four bytes).

从该地址获取值.

所以每次你从 a 取值时,a 的基地址被加到 i 乘以的结果中四.如果只是解引用一个指针,则步骤1.和2.不需要执行,只需执行步骤3.

So each time you fetch a value from a, the base address of a is added to the result of the multiplication of i by four. If you just dereference a pointer, step 1. and 2. don't need to be performed, only step 3.

考虑下面的代码.

int i;
int a[20];
int * b;

memset(a, 0, sizeof(a));
b = a;
for (i = 0; i < 20; i++) {
    printf("Value of %d is %d\n", i, *b);
    b++;
}

此代码可能更快...但即使是这样,差异也很小.为什么会更快?*b"与上述第 3 步相同.然而,b++"与第 1 步和第 2 步不同.b++"将指针增加 4.

This code might be faster... but even if it is, the difference is tiny. Why might it be faster? "*b" is the same as step 3. of above. However, "b++" is not the same as step 1. and step 2. "b++" will increase the pointer by 4.

(对于新手很重要:运行++在指针上不会增加指针在内存中的一个字节!它会将指针增加尽可能多的字节在内存中,因为它指向的数据是在尺寸方面.它指向一个 intint 在我的机器上是四个字节,所以 b++将 b 增加 4!)

(important for newbies: running ++ on a pointer will not increase the pointer one byte in memory! It will increase the pointer by as many bytes in memory as the data it points to is in size. It points to an int and the int is four bytes on my machine, so b++ increases b by four!)

好的,但为什么会更快?因为将四加到一个指针上比将 i 乘以四然后加到一个指针上要快.在任何一种情况下,您都有一个加法,但在第二种情况下,您没有乘法(您避免了一次乘法所需的 CPU 时间).考虑到现代 CPU 的速度,即使阵列是 1 个 mio 元素,我想知道您是否真的可以对差异进行基准测试.

Okay, but why might it be faster? Because adding four to a pointer is faster than multiplying i by four and adding that to a pointer. You have an addition in either case, but in the second one, you have no multiplication (you avoid the CPU time needed for one multiplication). Considering the speed of modern CPUs, even if the array was 1 mio elements, I wonder if you could really benchmark a difference, though.

现代编译器可以优化其中任何一个以使其同样快,您可以通过查看它生成的程序集输出来检查.您可以通过传递-S"来实现.选项(大写 S)到 GCC.

That a modern compiler can optimize either one to be equally fast is something you can check by looking at the assembly output it produces. You do so by passing the "-S" option (capital S) to GCC.

这是第一个C代码的代码(使用了优化级别-Os,这意味着优化代码大小和速度,但不要做会明显增加代码大小的速度优化,不像-O2-O3 非常不同):

Here's the code of first C code (optimization level -Os has been used, which means optimize for code size and speed, but don't do speed optimizations that will increase code size noticeably, unlike -O2 and much unlike -O3):

_main:
    pushl   %ebp
    movl    %esp, %ebp
    pushl   %edi
    pushl   %esi
    pushl   %ebx
    subl    $108, %esp
    call    ___i686.get_pc_thunk.bx
"L00000000001$pb":
    leal    -104(%ebp), %eax
    movl    $80, 8(%esp)
    movl    $0, 4(%esp)
    movl    %eax, (%esp)
    call    L_memset$stub
    xorl    %esi, %esi
    leal    LC0-"L00000000001$pb"(%ebx), %edi
L2:
    movl    -104(%ebp,%esi,4), %eax
    movl    %eax, 8(%esp)
    movl    %esi, 4(%esp)
    movl    %edi, (%esp)
    call    L_printf$stub
    addl    $1, %esi
    cmpl    $20, %esi
    jne L2
    addl    $108, %esp
    popl    %ebx
    popl    %esi
    popl    %edi
    popl    %ebp
    ret

与第二个代码相同:

_main:
    pushl   %ebp
    movl    %esp, %ebp
    pushl   %edi
    pushl   %esi
    pushl   %ebx
    subl    $124, %esp
    call    ___i686.get_pc_thunk.bx
"L00000000001$pb":
    leal    -104(%ebp), %eax
    movl    %eax, -108(%ebp)
    movl    $80, 8(%esp)
    movl    $0, 4(%esp)
    movl    %eax, (%esp)
    call    L_memset$stub
    xorl    %esi, %esi
    leal    LC0-"L00000000001$pb"(%ebx), %edi
L2:
    movl    -108(%ebp), %edx
    movl    (%edx,%esi,4), %eax
    movl    %eax, 8(%esp)
    movl    %esi, 4(%esp)
    movl    %edi, (%esp)
    call    L_printf$stub
    addl    $1, %esi
    cmpl    $20, %esi
    jne L2
    addl    $124, %esp
    popl    %ebx
    popl    %esi
    popl    %edi
    popl    %ebp
    ret

嗯,这是不同的,这是肯定的.104 和 108 的数字差异来自变量 b(在第一个代码中堆栈上少了一个变量,现在我们多了一个,改变了堆栈地址).for 循环中真正的代码差异是

Well, it's different, that's for sure. The 104 and 108 number difference comes of the variable b (in the first code there was one variable less on stack, now we have one more, changing stack addresses). The real code difference in the for loop is

movl    -104(%ebp,%esi,4), %eax

相比

movl    -108(%ebp), %edx
movl    (%edx,%esi,4), %eax

实际上对我来说,第一种方法似乎更快(!),因为它发出一个 CPU 机器代码来执行所有工作(CPU 为我们完成所有工作),而不是有两个机器代码.另一方面,下面的两个汇编命令的运行时间可能比上面的要短.

Actually to me it rather looks like the first approach is faster(!), since it issues one CPU machine code to perform all the work (the CPU does it all for us), instead of having two machine codes. On the other hand, the two assembly commands below might have a lower runtime altogether than the one above.

作为结束语,我想说取决于您的编译器和 CPU 功能(CPU 提供什么命令以何种方式访问​​内存),结果可能是两种方式.任何一个都可能更快/更慢.除非您将自己完全限制为一个编译器(也意味着一个版本)和一个特定的 CPU,否则您无法确定.由于 CPU 可以在单个汇编命令中执行越来越多的操作(很久以前,编译器确实必须手动获取地址,将 i 乘以 4,然后在获取值之前将两者相加),使用的语句多年前的绝对真理现在越来越受到质疑.还有谁知道 CPU 内部是如何工作的?上面我比较了一个汇编指令和另外两个.

As a closing word, I'd say depending on your compiler and the CPU capabilities (what commands CPUs offer to access memory in what way), the result might be either way. Either one might be faster/slower. You cannot say for sure unless you limit yourself exactly to one compiler (meaning also one version) and one specific CPU. As CPUs can do more and more in a single assembly command (ages ago, a compiler really had to manually fetch the address, multiply i by four and add both together before fetching the value), statements that used to be an absolute truth ages ago are nowadays more and more questionable. Also who knows how CPUs work internally? Above I compare one assembly instructions to two other ones.

我可以看到指令的数量不同,这样的指令需要的时间也可能不同.此外,这些指令在其机器演示中需要多少内存(毕竟它们需要从内存传输到 CPU 缓存)是不同的.然而,现代 CPU 不会按照您提供指令的方式执行指令.它们将大指令(通常称为 CISC)拆分为小的子指令(通常称为 RISC),这也使它们能够更好地在内部优化程序流以提高速度.事实上,第一条指令和下面的两条其他指令可能会产生同一组子指令,在这种情况下,没有任何可测量的速度差异.

I can see that the number of instructions is different and the time such an instruction needs can be different as well. Also how much memory these instructions needs in their machine presentation (they need to be transferred from memory to CPU cache after all) is different. However modern CPUs don't execute instructions the way you feed them. They split big instructions (often referred to as CISC) into small sub-instructions (often referred to as RISC), which also allows them to better optimize program flow for speed internally. In fact, the first, single instruction and the two other instructions below might result in the same set of sub-instructions, in which case there is no measurable speed difference whatsoever.

关于Objective-C,它只是带有扩展的C.因此,在 C 语言中适用的所有内容都将适用于 Objective-C,也适用于指针和数组.另一方面,如果您使用对象(例如,NSArrayNSMutableArray),这是完全不同的野兽.但是,在那种情况下,无论如何您都必须使用方法访问这些数组,没有可供选择的指针/数组访问.

Regarding Objective-C, it is just C with extensions. So everything that holds true for C will hold true for Objective-C as well in terms of pointers and arrays. If you use Objects on the other hand (for example, an NSArray or NSMutableArray), this is a completely different beast. However in that case you must access these arrays with methods anyway, there is no pointer/array access to choose from.

这篇关于通过指针运算与 C 中的下标访问数组值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆