为什么 Linux 性能计数器中的指令数量不确定 [英] Why are number of instructions non-deterministic in Linux performance counters

查看:52
本文介绍了为什么 Linux 性能计数器中的指令数量不确定的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了能够描述其二进制文件实际上将在模拟器(NS-3/DCE)下运行的应用程序运行时.我想使用linux性能计数器,我希望没有确定性来源的应用程序的指令计数是确定性的.根据linux性能计数器,我再没错了,让我们举一个简单的例子:

To be able to profile application runtimes whose binaries will actually be run under a simulator (NS-3/DCE). I wanted to use the linux performance counters, I expected the instruction count for an application which has no source of non-determinism to be deterministic. I couldn't be more wrong according to the linux performance counters, let's take a simple example:

$ (perf stat -c -- sleep 1 2>&1 && perf stat -c -- sleep 1 2>&1) |grep instructions
        669218 instructions              #    0,61  insns per cycle
        682286 instructions              #    0,58  insns per cycle

1)这种不确定性的根源是什么?这是否源于CPU中的低级分支预测和其他引擎.

1) What is the source of this non-determinism? Does this stem from the low-level branch-prediction and other engines in the CPU.

2)另一个问题,是否有一种方法可以知道送入CPU的指令量(与示例输出中的指令量相比),以便确实地获取已执行的代码量?

2) Other question, is there a way to know the amount of instructions fed to the CPU (in contrast to the amount of instructions in the example output), in order to do get the amount of executed code in a deterministic way?

推荐答案

摘要:

1)不确定性是由于 sleep 1 命令的变化而不是分支预测或其他微体系结构特征引起的.

1) The non-determinism is caused by variation in the sleep 1 command not from branch-prediction or other microarchitectural features.

2)如果您的CPU支持,您可以找到使用硬件偶数计数器获取的指令数.但是,这将比退休指令的数量变化更大(这是perf通常报告的指令).

2) You can find the number of instruction fetched by using a hardware even counter if your CPU supports it. However, this will vary more than the number of instructions retired (which is what perf typically reports for instructions).

详细信息:

如果您想执行确定数量的指令, sleep 命令不是一个好的测试用例.它将执行不确定的指令数,因为内核的工作会有一些细微的变化.

The sleep command is not a good test case if you want a deterministic number of instructions to execute. It will execute a non-deterministic number of instructions because there will be some slight variation in what the kernel is doing.

您可以使用 instructions:u (对于用户模式)或 instructions:k (对于内核模式)来指定是收集用户模式还是内核模式指令计数.对于两次运行:

You can specify whether to collect user-mode or kernel-mode instruction counts with the instructions:u for user-mode or instructions:k for kernel mode. For two runs of:

perf stat -e instructions:k,instructions:u,instructions sleep 1

我得到以下结果:

Performance counter stats for 'sleep 1':

       373,044 instructions:k            #    0.00  insns per cycle        
       199,795 instructions:u            #    0.00  insns per cycle        
       572,839 instructions              #    0.00  insns per cycle        

   1.001018153 seconds time elapsed

Performance counter stats for 'sleep 1':

       379,722 instructions:k            #    0.00  insns per cycle        
       199,970 instructions:u            #    0.00  insns per cycle        
       579,519 instructions              #    0.00  insns per cycle        

   1.000986201 seconds time elapsed

您可以看到 sleep 1 的实际经过时间略有不同.这是不确定性的根源.但是,用户模式指令的数量变化少于内核模式指令.

As you can see the actual elapsed time of sleep 1 varies slightly. Which is the source of the non-determinism. However, the number of user-mode instructions has less variation than kernel-mode instructions.

这篇关于为什么 Linux 性能计数器中的指令数量不确定的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆