如何计算gem5中基准测试开始和结束之间的CPU时钟周期数? [英] How to count the number of CPU clock cycles between the start and end of a benchmark in gem5?

查看:496
本文介绍了如何计算gem5中基准测试开始和结束之间的CPU时钟周期数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何计算gem5中基准测试开始和结束之间的CPU时钟周期数?

How to count the number of CPU clock cycles between the start and end of a benchmark in gem5?

我对以下所有情况都感兴趣:

I'm interested in all of the following cases:

  • 完整系统用户界面基准.也许m5 guest虚拟机工具可以做到这一点?

  • full system userland benchmark. Maybe the m5 guest tool has a way to do it?

裸机基准.当gem5退出时,它会自动转储统计信息,因此主要问题是如何跳过引导加载程序的周期并直接进入基准测试本身.

bare metal benchmark. When gem5 exits it dumps the stats automatically, so the main question is how to skip the cycles for bootloader and go straight to the benchmark itself.

除了通过仪器指令修改基准源外,还有其他方法吗?如何详细编写这些仪器说明?

Is there a way besides modifying the benchmark source with instrumentation instructions? How to write those instrumentation instructions in detail?

syscall仿真基准.我认为gem5会在运行结束时输出stats.txt,然后您只需grep system.cpu.numCycles即可,但是我必须确认它,目前已被阻止在:

syscall emulation benchmark. I think gem5 just outputs the stats.txt at the end of the run, and then you ca just grep system.cpu.numCycles, but I have to confirm it, currently blocked on: How to solve "FATAL: kernel too old" when running gem5 in syscall emulation SE mode?

我想用它来学习:

  • 了解CPU的工作方式
  • 如何优化汇编代码或编译器设置以在给定的CPU上最佳运行

推荐答案

m5工具

m5 tool

一个很好的近似值是运行,理想情况是从/init程序的shell脚本中运行:

A good approximation is to run, ideally from a shell script that is the /init program:

m5 resetstats
run-benchmark
m5 dumpstats

然后在主机上

grep -E '^system.cpu.numCycles ' m5out/stats.txt

给出类似的内容:

system.cpu.numCycles                      33942872680                       # number of cpu cycles simulated

请注意,如果您使用不同的CPU从m5 checkpoint重放,例如:

Note that if you replay from a m5 checkpoint with a different CPU, e.g.:

--restore-with-cpu=HPI --caches

然后,您需要grep输入其他标识符:

then you need to grep for a different identifier:

grep -E '^system.switch_cpus.numCycles ' m5out/stats.txt

resetstats将累计统计信息归零,并且dumpstats转储基准测试期间收集的内容.

resetstats zeroes out the cumulative stats, and dumpstats dumps what has been collected during the benchmark.

这不是完美的,因为在m5 dumpstats完成的exec syscall和基准测试开始之间有一段时间,但是如果基准测试足够,那就没关系了.

This is not perfect since there is some time between the exec syscall for m5 dumpstats finishing and the benchmark starting, but if the benchmark enough, this shouldn't matter.

http://arm.ecs .soton.ac.uk/wp-content/uploads/2016/10/gem5_tutorial.pdf 还提出了一些启发式方法:

http://arm.ecs.soton.ac.uk/wp-content/uploads/2016/10/gem5_tutorial.pdf also proposes a few more heuristics:

#!/bin/sh
# Wait for system to calm down
sleep 10
# Take a checkpoint in 100000 ns
m5 checkpoint 100000
# Reset the stats
m5 resetstats
run-benchmark
# Exit the simulation
m5 exit

m5 exit也起作用,因为GEM5完成后会转储统计信息.

m5 exit also works since GEM5 dumps stats when it finishes.

仪器说明

有时候这些似乎是不可避免的,您必须使用这些指令对输入源代码进行一些修改,以便:

Sometimes those seem to be just inevitable that you have to modify the input source code a bit with those instructions in order to:

  • 跳过初始化并直接进入稳定状态
  • 评估各个主循环运行

您当然可以从gem5 m5工具代码中推断出这些指令,但是

You can of course deduce those instructions from the gem5 m5 tool code code, but here are some very easy to re-use one line copy pastes for arm and aarch64, e.g. for aarch64:

/* resetstats */
__asm__ __volatile__ ("mov x0, #0; mov x1, #0; .inst 0XFF000110 | (0x40 << 16);" : : : "x0", "x1")
/* dumpstats */
__asm__ __volatile__ ("mov x0, #0; mov x1, #0; .inst 0xFF000110 | (0x41 << 16);" : : : "x0", "x1")

m5工具在内部使用了相同的机制,但是通过直接将指令添加到源代码中,我们避免了syscall,因此更加精确和具有代表性(以更多的人工工作为代价).

The m5 tool uses the same mechanism under the hood, but by adding the instructions directly into the source, we avoid the syscall, and therefore more precise and representative (at the cost of more manual work).

但是,为了确保编译器不会在您的投资回报率附近对程序集进行重新排序,您可能希望使用以下提及的技术:

To ensure that the assembly is not reordered around your ROI by the compiler however, you might want to use the techniques mentioned at: Enforcing statement order in C++

地址监视

可以使用的另一种技术是监视感兴趣的地址,而不是在源中添加魔术指令.

Another technique that can be used is to monitory addresses of interest instead of adding magic instructions to the source.

例如,如果您知道某个基准以PIC == 0x400开头,则在点击该地址时应该可以执行某些操作.

E.g., if you know that a benchmark starts with PIC == 0x400, it should be possible to do something when that addresses is hit.

要查找感兴趣的地址,例如,您必须使用readelfgdb跟踪,如果要在Linux上运行完整系统,请确保ASLR已关闭.

To find the addresses of interest, you would have for example to use readelf or gdb or tracing, and the if running full system on top of Linux, ensure that ASLR is turned off.

该技术将是最少侵入性的技术,但是设置起来比较困难,老实说,我还没有做到这一点.一天,一天.

This technique would be the least intrusive one, but the setup is harder, and to be honest I haven't done it yet. One day, one day.

这篇关于如何计算gem5中基准测试开始和结束之间的CPU时钟周期数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆