如何计算gem5中基准测试开始和结束之间的CPU时钟周期数? [英] How to count the number of CPU clock cycles between the start and end of a benchmark in gem5?
问题描述
如何计算gem5中基准测试开始和结束之间的CPU时钟周期数?
How to count the number of CPU clock cycles between the start and end of a benchmark in gem5?
我对以下所有情况都感兴趣:
I'm interested in all of the following cases:
-
完整系统用户界面基准.也许
m5
guest虚拟机工具可以做到这一点?
full system userland benchmark. Maybe the
m5
guest tool has a way to do it?
裸机基准.当gem5退出时,它会自动转储统计信息,因此主要问题是如何跳过引导加载程序的周期并直接进入基准测试本身.
bare metal benchmark. When gem5 exits it dumps the stats automatically, so the main question is how to skip the cycles for bootloader and go straight to the benchmark itself.
除了通过仪器指令修改基准源外,还有其他方法吗?如何详细编写这些仪器说明?
Is there a way besides modifying the benchmark source with instrumentation instructions? How to write those instrumentation instructions in detail?
syscall仿真基准.我认为gem5会在运行结束时输出stats.txt
,然后您只需grep system.cpu.numCycles
即可,但是我必须确认它,目前已被阻止在:
syscall emulation benchmark. I think gem5 just outputs the stats.txt
at the end of the run, and then you ca just grep system.cpu.numCycles
, but I have to confirm it, currently blocked on: How to solve "FATAL: kernel too old" when running gem5 in syscall emulation SE mode?
我想用它来学习:
- 了解CPU的工作方式
- 如何优化汇编代码或编译器设置以在给定的CPU上最佳运行
推荐答案
m5
工具
m5
tool
一个很好的近似值是运行,理想情况是从/init
程序的shell脚本中运行:
A good approximation is to run, ideally from a shell script that is the /init
program:
m5 resetstats
run-benchmark
m5 dumpstats
然后在主机上
grep -E '^system.cpu.numCycles ' m5out/stats.txt
给出类似的内容:
system.cpu.numCycles 33942872680 # number of cpu cycles simulated
请注意,如果您使用不同的CPU从m5 checkpoint
重放,例如:
Note that if you replay from a m5 checkpoint
with a different CPU, e.g.:
--restore-with-cpu=HPI --caches
然后,您需要grep输入其他标识符:
then you need to grep for a different identifier:
grep -E '^system.switch_cpus.numCycles ' m5out/stats.txt
resetstats
将累计统计信息归零,并且dumpstats
转储基准测试期间收集的内容.
resetstats
zeroes out the cumulative stats, and dumpstats
dumps what has been collected during the benchmark.
这不是完美的,因为在m5 dumpstats
完成的exec syscall和基准测试开始之间有一段时间,但是如果基准测试足够,那就没关系了.
This is not perfect since there is some time between the exec syscall for m5 dumpstats
finishing and the benchmark starting, but if the benchmark enough, this shouldn't matter.
http://arm.ecs .soton.ac.uk/wp-content/uploads/2016/10/gem5_tutorial.pdf 还提出了一些启发式方法:
http://arm.ecs.soton.ac.uk/wp-content/uploads/2016/10/gem5_tutorial.pdf also proposes a few more heuristics:
#!/bin/sh
# Wait for system to calm down
sleep 10
# Take a checkpoint in 100000 ns
m5 checkpoint 100000
# Reset the stats
m5 resetstats
run-benchmark
# Exit the simulation
m5 exit
m5 exit
也起作用,因为GEM5完成后会转储统计信息.
m5 exit
also works since GEM5 dumps stats when it finishes.
仪器说明
有时候这些似乎是不可避免的,您必须使用这些指令对输入源代码进行一些修改,以便:
Sometimes those seem to be just inevitable that you have to modify the input source code a bit with those instructions in order to:
- 跳过初始化并直接进入稳定状态
- 评估各个主循环运行
You can of course deduce those instructions from the gem5 m5
tool code code, but here are some very easy to re-use one line copy pastes for arm and aarch64, e.g. for aarch64:
/* resetstats */
__asm__ __volatile__ ("mov x0, #0; mov x1, #0; .inst 0XFF000110 | (0x40 << 16);" : : : "x0", "x1")
/* dumpstats */
__asm__ __volatile__ ("mov x0, #0; mov x1, #0; .inst 0xFF000110 | (0x41 << 16);" : : : "x0", "x1")
m5
工具在内部使用了相同的机制,但是通过直接将指令添加到源代码中,我们避免了syscall,因此更加精确和具有代表性(以更多的人工工作为代价).
The m5
tool uses the same mechanism under the hood, but by adding the instructions directly into the source, we avoid the syscall, and therefore more precise and representative (at the cost of more manual work).
但是,为了确保编译器不会在您的投资回报率附近对程序集进行重新排序,您可能希望使用以下提及的技术:
To ensure that the assembly is not reordered around your ROI by the compiler however, you might want to use the techniques mentioned at: Enforcing statement order in C++
地址监视
可以使用的另一种技术是监视感兴趣的地址,而不是在源中添加魔术指令.
Another technique that can be used is to monitory addresses of interest instead of adding magic instructions to the source.
例如,如果您知道某个基准以PIC == 0x400
开头,则在点击该地址时应该可以执行某些操作.
E.g., if you know that a benchmark starts with PIC == 0x400
, it should be possible to do something when that addresses is hit.
要查找感兴趣的地址,例如,您必须使用readelf
或gdb
或跟踪,如果要在Linux上运行完整系统,请确保ASLR已关闭.
To find the addresses of interest, you would have for example to use readelf
or gdb
or tracing, and the if running full system on top of Linux, ensure that ASLR is turned off.
该技术将是最少侵入性的技术,但是设置起来比较困难,老实说,我还没有做到这一点.一天,一天.
This technique would be the least intrusive one, but the setup is harder, and to be honest I haven't done it yet. One day, one day.
这篇关于如何计算gem5中基准测试开始和结束之间的CPU时钟周期数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!