如何比较同一算法的两个实现? (通过检查他们的大会code) [英] How to compare two implementations of the same algorithm? (by examine their Assembly code)

查看:199
本文介绍了如何比较同一算法的两个实现? (通过检查他们的大会code)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有同样的算法组件的两个实现。我想知道通过检查两个片段codeS哪一个是速度更快。

我想人们可能会考虑到的参数有:一些选项codeS的,分枝数,功能帧数

我的问题是:

  
      
  1. 我可以假设每个运算code的执行需要一个周期?
  2.   
  3. 什么是分支它打破了管道的开销?
  4.   
  5. 什么是调用一个函数的影响和开销?
  6.   
  7. 是否有ARM和x86?
  8. 之间的分析差异   

现在的问题是理论上的,因为我有两个实现; 1 130的指令长,一个是184的指令长。

和我想知道这是否肯定为真要说130的指令长的片段比184条指令执行?

更快

更好==更快

解决方案

这肯定是不正确的说,130指令code是大于184指令code更快。这是很容易有1000的指令运行速度比100,反之亦然更快的在任的这些平台。

  

1我可以假设每个运算code的执行需要一个周期?

通过看广告MIPS / MHz的开始,虽然市场数量也给出了什么是可能的一个大概的了解。如果该数目大于1则每时钟一个以上的指令是可能的。

  

2什么是分支它打破了管道的开销?

从任何地方绝对没有到一个非常戏剧性的影响的影响,无论是系统上。一个时钟数百是潜在的损失。

  

3什么是调用一个函数的影响和开销?

很大程度上取决于功能,函数调用该函数。根据不同的调用约定,你可能需要的寄存器保存到堆栈,或重新排列的寄存器,以prepare内容的参数函数被调用。如果传递一个结构按值结构的副本可能需要在堆栈上做出,更大的结构通过更大的副本。曾经在函数栈帧可能需要prepared,等,等有牵涉到很多因素。这个问题的答案也是独立的平台。

  

4是否有ARM和x86?

之间的分析差异

yes和no,这两个系统使用管道,支prediction等,以保持MIPS / MHz的了所有的现代招数。 ARM是要提供更好的MIPS每MHz超过86,86是可变的指令长度可能会给单位缓存的详细说明。你如何分析缓存,存储器和外围系统中的分析系统侧是大致相同的。的指示和核心的比较相似,不同,这取决于哪些方面要分析。手臂是不是微codeD,x86的可能是让你真的不看有多少寄存器真的有,这样的事情。同时在x86你可以得到一个更好看的手臂内存系统,因为它们通常不是片上系统。根据你买什么ARM芯片,你可能会失去很多在芯片的边界的知名度,可能无法看到所有的内存和外设总线,例如。 (86改变,通过把PCIE芯片现在为例)的东西你所提到的皮层一类的情况下,你将有芯片的知名度类似的边缘那些会使用更大/更便宜的基于DRAM内存芯片外,而不是微控制器就像在芯片上的资源。

底线你的最后一个问题:

我想知道这是否肯定为真要说130的指令长的片段比184条指令实现更快吗?

这肯定是不正确的说,130指令片段比184指令片段更快。它可能会更快它可能是更慢,这可能是大约相同的。随着更多的信息,我们也许可以做一个pretty的好语句或它可能仍然是不确定的。很容易地选择执行速度比1000的指令,同样容易选择1000指令执行速度超过100指令(即使我添加任何分支和没有循环,只是直线执行)100指令

Assume I have two implementations of the same algorithm in assembly. I would like to know by examining the two snippets codes which one is faster.

The parameters I thought one might take into account are: number of op-codes, number of branches, number of function frames.

My questions are:

  1. Can I assume each opcode execution is one cycle ?
  2. What is the overhead of branch which break the pipeline ?
  3. What are the effects and overhead of calling a function ?
  4. Is there a difference in the analysis between ARM and x86 ?

The question is theoretical since I have two implementations; one 130 instructions long and one is 184 instructions long.

And I would like to know if it is definitely true to say the 130 instructions long snippet is faster than the 184 instructions long implementation?

"BETTER == FASTER"

解决方案

It is definitely not true to say that the 130 instruction code is faster than the 184 instruction code. it is very easy to have 1000 instructions run faster than 100 and vice versa on either of these platforms.

1 Can I assume each opcode execution is one cycle ?

Start by looking at the advertised mips/mhz, although a marketing number it gives a rough idea of what is possible. If the number is greater than one then more than one instruction per clock is possible.

2 What is the overhead of branch which break the pipeline ?

Anywhere from absolutely no affect to a very dramatic affect, on either system. one clock to hundreds are the potential penalty.

3 What are the effects and overhead of calling a function ?

Depends heavily on the function, and the function calling the function. Depending on the calling convention you might have to save registers to the stack, or rearrange the contents of registers to prepare for the parameters for the function to be called. If passing a struct by value a copy of the struct may need to be made on the stack, the bigger the struct passed the bigger the copy. once in the function a stack frame may need to be prepared, etc, etc. There are many factors involved. This question and answer are also independent of platform.

4 Is there a difference in the analysis between ARM and x86 ?

yes and no, both systems use all the modern tricks of pipelining, branch prediction, etc to keep the mips/mhz up. ARM is going to give a better mips per mhz than x86, x86 being variable instruction length might give more instructions per unit cache. How you analyze the cache, and memory and peripheral systems in the systems side of the analysis is roughly the same. The comparison of the instructions and core are similar and different depending on what aspects you are analyzing. The arm is not microcoded, the x86 likely is so you dont really see how many registers there really are, things like that. at the same time the x86 you can get a better look at the memory system with the arm, since they are generally not system on a chip. Depending on what ARM chip you buy you may lose a lot of the visibility in the boundaries of the chip, might not see all the memory and peripheral busses, for example. (x86 is changing that by putting pcie on chip now for example) in the case of something in the cortex-a class you mentioned you would have similar edge of chip visibility as those would use larger/cheaper dram based memory off chip rather than microcontroller like on chip resources.

Bottom line your final question:

"And I would like to know if it is definitely true to say the 130 instructions long snippet is faster than the 184 instructions long implementation?"

It is definitely NOT TRUE to say the 130 instruction snippet is faster than the 184 instruction snippet. It might be faster it might be slower and it might be about the same. With a lot more information we might be able to make a pretty good statement or it may still be non-deterministic. it is easy to choose 100 instructions that execute faster than 1000 instructions and likewise easy to choose 1000 instructions that execute faster than 100 instructions (even if I were to add no branching and no loops, just linear execution)

这篇关于如何比较同一算法的两个实现? (通过检查他们的大会code)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆