通过指针运算用C访问数组的值与下标 [英] Accessing array values via pointer arithmetic vs. subscripting in C

查看:176
本文介绍了通过指针运算用C访问数组的值与下标的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我继续读,在C,使用指针运算通常比下标为数组访问更快。这甚至与现代(据说-优化)编译器是真的吗?

I keep reading that, in C, using pointer arithmetic is generally faster than subscripting for array access. Is this true even with modern (supposedly-optimizing) compilers?

如果是这样,这仍然是这样,我开始从学习C到Objective-C和可可移开在Mac上的?

If so, is this still the case as I begin to move away from learning C into Objective-C and Cocoa on Macs?

这是preferred编码风格数组访问,在C和Objective-C?这被认为是(用各自的语言专业人士)更清晰,更正确(由于没有一个更好的词)?

Which is the preferred coding style for array access, in both C and Objective-C? Which is considered (by professionals of their respective languages) more legible, more "correct" (for lack of a better term)?

推荐答案

您需要了解这种说法背后的原因。你有没有问自己为什么它是更快?让我们比较了一些code:

You need to understand the reason behind this claim. Have you ever questioned yourself why it is faster? Let's compare some code:

int i;
int a[20];

// Init all values to zero
memset(a, 0, sizeof(a));
for (i = 0; i < 20; i++) {
    printf("Value of %d is %d\n", i, a[i]);
}

他们都为零,现在的问题是一个惊喜:-P,拿什么 A [I] 实际上是在低水平设备code?这意味着

They are all zero, what a surprise :-P The question is, what means a[i] actually in low level machine code? It means


  1. 取的地址 A 在内存中。

添加的 A 到该地址(INT通常是单个项目的大小 I 倍四字节)。

Add i times the size of a single item of a to that address (int usually is four bytes).

获取从该地址的值。

每次所以你取从值 A A 添加到结果的基址 I 的四乘法。如果你只是解引用指针,步骤1和2。不需要进行,只有第3步

So each time you fetch a value from a, the base address of a is added to the result of the multiplication of i by four. If you just dereference a pointer, step 1. and 2. don't need to be performed, only step 3.

考虑下面的code。

int i;
int a[20];
int * b;

memset(a, 0, sizeof(a));
b = a;
for (i = 0; i < 20; i++) {
    printf("Value of %d is %d\n", i, *b);
    b++;
}

这code 可能更快......但即使是这样,差别很小。为什么它可能更快? * b为相同的上述第3步。然而,B +是不一样的步骤1和步骤2的B +,将通过4增加指针。

This code might be faster... but even if it is, the difference is tiny. Why might it be faster? "*b" is the same as step 3. of above. However, "b++" is not the same as step 1. and step 2. "b++" will increase the pointer by 4.

新手重要:运行 ++
  在指针不会增加
  在内存指针的一个字节!它会
  增加的字节数指针
  在内存中的数据它指向的是
  在尺寸方面。它指向一个 INT
   INT 是四个字节我的机器上,因此B ++
  提高4 b!)

(important for newbies: running ++ on a pointer will not increase the pointer one byte in memory! It will increase the pointer by as many bytes in memory as the data it points to is in size. It points to an int and the int is four bytes on my machine, so b++ increases b by four!)

好吧,但为什么会是更快?因为添加四指针比乘以4 I ,并补充说,一个指针快。你必须在任一情况下添加,但在第二个,你没有乘法(避免需要一次乘法的CPU时间)。考虑到现代CPU的速度,即使阵列1百万元,我不知道你能否真的基准的差异,虽然。

Okay, but why might it be faster? Because adding four to a pointer is faster than multiplying i by four and adding that to a pointer. You have an addition in either case, but in the second one, you have no multiplication (you avoid the CPU time needed for one multiplication). Considering the speed of modern CPUs, even if the array was 1 mio elements, I wonder if you could really benchmark a difference, though.

这是一个现代的编译器可以优化任何一个是同样快的东西,你可以通过查看它产生的汇编输出检查。这可以通过使用-S选项(大写S),以GCC这样做。

That a modern compiler can optimize either one to be equally fast is something you can check by looking at the assembly output it produces. You do so by passing the "-S" option (capital S) to GCC.

下面是第一个C code(优化级别 -Os 的code已被使用,这意味着优化code的规模和速度,但不这样做速度优化,这将增加code尺寸明显不同于 -O2 等等不像 -O3

Here's the code of first C code (optimization level -Os has been used, which means optimize for code size and speed, but don't do speed optimizations that will increase code size noticeably, unlike -O2 and much unlike -O3):

_main:
    pushl   %ebp
    movl    %esp, %ebp
    pushl   %edi
    pushl   %esi
    pushl   %ebx
    subl    $108, %esp
    call    ___i686.get_pc_thunk.bx
"L00000000001$pb":
    leal    -104(%ebp), %eax
    movl    $80, 8(%esp)
    movl    $0, 4(%esp)
    movl    %eax, (%esp)
    call    L_memset$stub
    xorl    %esi, %esi
    leal    LC0-"L00000000001$pb"(%ebx), %edi
L2:
    movl    -104(%ebp,%esi,4), %eax
    movl    %eax, 8(%esp)
    movl    %esi, 4(%esp)
    movl    %edi, (%esp)
    call    L_printf$stub
    addl    $1, %esi
    cmpl    $20, %esi
    jne L2
    addl    $108, %esp
    popl    %ebx
    popl    %esi
    popl    %edi
    popl    %ebp
    ret

同样的,第二个code:

Same with the second code:

_main:
    pushl   %ebp
    movl    %esp, %ebp
    pushl   %edi
    pushl   %esi
    pushl   %ebx
    subl    $124, %esp
    call    ___i686.get_pc_thunk.bx
"L00000000001$pb":
    leal    -104(%ebp), %eax
    movl    %eax, -108(%ebp)
    movl    $80, 8(%esp)
    movl    $0, 4(%esp)
    movl    %eax, (%esp)
    call    L_memset$stub
    xorl    %esi, %esi
    leal    LC0-"L00000000001$pb"(%ebx), %edi
L2:
    movl    -108(%ebp), %edx
    movl    (%edx,%esi,4), %eax
    movl    %eax, 8(%esp)
    movl    %esi, 4(%esp)
    movl    %edi, (%esp)
    call    L_printf$stub
    addl    $1, %esi
    cmpl    $20, %esi
    jne L2
    addl    $124, %esp
    popl    %ebx
    popl    %esi
    popl    %edi
    popl    %ebp
    ret

嗯,这是不同的,这是肯定的。在104和108号区别在于变量 B 的(在第一个code有堆栈一个变量少,现在我们多了一个,改变堆栈地址) 。在循环真正的code不同的是

Well, it's different, that's for sure. The 104 and 108 number difference comes of the variable b (in the first code there was one variable less on stack, now we have one more, changing stack addresses). The real code difference in the for loop is

movl    -104(%ebp,%esi,4), %eax

相比,

movl    -108(%ebp), %edx
movl    (%edx,%esi,4), %eax

其实对于我来说,相当看起来像第一种方式是快(!),因为它会发出一个CPU的机器code来执行所有的工作(在CPU做这一切对我们来说),而不是有两个机器codeS。另一方面,这两个组件下方命令可能具有较低的运行完全比上面所述一个

Actually to me it rather looks like the first approach is faster(!), since it issues one CPU machine code to perform all the work (the CPU does it all for us), instead of having two machine codes. On the other hand, the two assembly commands below might have a lower runtime altogether than the one above.

作为结束的话,我会说这取决于你的编译器和CPU功能(CPU提供了以什么方式访问内存什么命令),结果可能是两种方式。任何一个可能会快/慢。除非你确切地限制自己一个编译器(也意味着一个版本)和一个特定的CPU,你不能肯定地说。由于CPU可以在一个单一的组装命令做多(很久很久以前,一个编译器真的有手动获取地址,乘 I 四和提取的前添加两者一起值),曾经是一个绝对的真理久远陈述时下越来越多的质疑。另外,谁知道了CPU内部工作?上面我比较一个汇编指令另外两个人。

As a closing word, I'd say depending on your compiler and the CPU capabilities (what commands CPUs offer to access memory in what way), the result might be either way. Either one might be faster/slower. You cannot say for sure unless you limit yourself exactly to one compiler (meaning also one version) and one specific CPU. As CPUs can do more and more in a single assembly command (ages ago, a compiler really had to manually fetch the address, multiply i by four and add both together before fetching the value), statements that used to be an absolute truth ages ago are nowadays more and more questionable. Also who knows how CPUs work internally? Above I compare one assembly instructions to two other ones.

我可以看到的指令的数量是不同的,并且该时间这样的指令需要可以是不同的,以及。此外,这些说明在他们的机器presentation需要多少内存(他们需要从内存转移到CPU缓存毕竟)是不同的。但是现代的CPU不执行指令你喂它们的方式。分割大指令(通常称为CISC)成小的子指令(通常被称为RISC),这也使他们能够更好地优化内部速度的程序流程。实际上,第一,单指令和下面的两个其他指令可能会导致的同一组子指令的,在这种情况下,不存在可测量的速度差任何

I can see that the number of instructions is different and the time such an instruction needs can be different as well. Also how much memory these instructions needs in their machine presentation (they need to be transferred from memory to CPU cache after all) is different. However modern CPUs don't execute instructions the way you feed them. The split big instructions (often referred to as CISC) into small sub-instructions (often referred to as RISC), which also allows them to better optimize program flow for speed internally. In fact, the first, single instruction and the two other instructions below might result in the same set of sub-instructions, in which case there is no measurable speed difference whatsoever.

关于Objective-C的,它只是下与扩展。所以这对C也是如此一切都将举行的Objective-C,以及在指针和数组方面如此。如果您使用的对象,从另一方面(例如,的NSArray 的NSMutableArray ),这是一个完全不同的野兽。然而,在这种情况下,你必须使用方法来访问这些阵列反正有没有指针/数组访问选择。

Regarding Objective-C, it is just C with extensions. So everything that holds true for C will hold true for Objective-C as well in terms of pointers and arrays. If you use Objects on the other hand (for example, an NSArray or NSMutableArray), this is a completely different beast. However in that case you must access these arrays with methods anyway, there is no pointer/array access to choose from.

这篇关于通过指针运算用C访问数组的值与下标的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆