如何使用rdtsc在Qemu i386系统中进行基准测试 [英] How to benchmark in Qemu i386 system using rdtsc

查看:211
本文介绍了如何使用rdtsc在Qemu i386系统中进行基准测试的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当前,我正在尝试测量在同一环境中由两种不同的编程语言完成一个操作所花费的时钟周期数. (无需使用操作系统)

Currently I am trying to measure number of clock cycles taken to complete an operation by two different programming languages on same environment. (without using an OS)

目前,我正在使用Qemu-i386仿真器并使用rdtsc测量时钟周期.

Currently I am using Qemu-i386 emulator and using rdtsc to measure the clock cycles.

/* Return the number of CPU ticks since boot. */
static inline u64 rdtsc(void)
{
    u32 hi, lo;
    // asm("cpuid");
    asm("rdtsc" : "=a" (lo), "=d" (hi));
    return ((u64) lo) | (((u64) hi) << 32);
}

计算操作前后的rdtsc之差应提供时钟周期数.

Taking the difference between rdtsc before and after operation should provide the number of clock cycles.

    start_time = rdtsc();
    operation();
    stop_time = rdtsc();
    num_cycles = stop_time-start_time;

但是,即使我进行了100多次迭代并且相差数千个周期,差异也不是恒定的.

But the difference is not constant even when I take over 100s of iterations and varies by few thousands of cycles.

  • 有没有更好的方法来测量时钟周期?

  • Is there any better way of measuring clock cycles?

在Qemu中还有什么方法可以将频率作为输入参数? 目前我正在使用

Also is there any way of providing frequency as an input parameter in Qemu? Currently I am using

qemu-system-i386 -kernel out.elf

qemu-system-i386 -kernel out.elf

推荐答案

尝试在QEMU仿真下对来宾软件进行基准测试非常困难. QEMU的仿真没有真正的硬件CPU那样的性能特征:某些在硬件上快速运行的操作(例如浮点数)在QEMU上非常慢.我们不对缓存进行建模,当数据集达到缓存行或L1/L2/etc缓存大小限制时,您将看不到性能曲线之类的东西;等等.

Trying to benchmark guest software under QEMU emulation is at best extremely difficult. QEMU's emulation does not have performance characteristics that are anything like a real hardware CPU's: some operations that are fast on hardware, like floating point, are very slow on QEMU; we don't model caches and you won't see anything like the performance curves you would see as data sets reach cache line or L1/L2/etc cache size limits; and so on.

现代CPU上性能的重要因素包括(至少):

Important factors in performance on a modern CPU include (at least):

  • 已执行原始指令计数
  • TLB未命中
  • 分支预测变量未命中
  • 缓存未命中

QEMU不跟踪后三个中的任何一个,仅在使用-icount选项时才对第一个进行模糊的尝试. (特别是,在没有-icount的情况下,我们在仿真下提供给来宾的RDTSC值或多或少只是主机CPU RDTSC值,因此,用它衡量的时间将包括各种QEMU开销,包括翻译来宾代码所花费的时间.)

QEMU doesn't track any of the last three and only makes a vague attempt at the first one if you use the -icount option. (In particular, without -icount the RDTSC value we provide to the guest under emulation is more-or-less just the host CPU RDTSC value, so times measured with it will include all sorts of QEMU overhead including time spent translating guest code.)

假设您在x86主机上,则可以尝试使用-enable-kvm选项在KVM虚拟机上运行它.然后,至少您将查看硬件CPU的实际性能,尽管由于其他主机进程与VM争用CPU,您仍然会从开销中看到一些噪音.

Assuming you're on an x86 host, you could try the -enable-kvm option to run this under a KVM virtual machine. Then at least you'll be looking at the real performance of a hardware CPU, though you will still see some noise from the overhead as other host processes contend for CPU with the VM.

这篇关于如何使用rdtsc在Qemu i386系统中进行基准测试的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆