VM上的奇怪程序延迟行为 [英] Weird program latency behavior on VM

查看:183
本文介绍了VM上的奇怪程序延迟行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一个程序来读取256KB的数组,以获得1ms的延迟。程序是相当简单和附加。
然而,当我在Xen上的VM上运行它时,我发现延迟不稳定。它有以下模式:时间单位是ms。

I wrote a program to read 256KB array to get 1ms latency. The program is pretty simple and attached. However, when I run it on VM on Xen, I found that the latency is not stable. It has the following pattern: The time unit is ms.

    #totalCycle CyclePerLine  totalms
    22583885 5513 6.452539
    3474342 848 0.992669
    3208486 783 0.916710
    25848572 6310 7.385306
    3225768 787 0.921648
    3210487 783 0.917282
    25974700 6341 7.421343
    3244891 792 0.927112
    3276027 799 0.936008
    25641513 6260 7.326147
    3531084 862 1.008881
    3233687 789 0.923911
    22397733 5468 6.399352
    3523403 860 1.006687
    3586178 875 1.024622
    26094384 6370 7.455538
    3540329 864 1.011523
    3812086 930 1.089167
    25907966 6325 7.402276

我想一些进程正在做一些事情,它就像一个事件驱动的过程。有没有人遇到这个?或任何人可以指出可能使这种情况发生的潜在流程/服务?

I'm thinking some process is doing something and it's like an event-driven process. Does any one encounter this before? or anyone can point out the potential process/services that could make this happen?

以下是我的程序。我跑了1000次。每次得到上面结果的一行。

Below is my program. I run it for 1000 times. Each time got the one line of the result above.

#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <string>
#include <ctime>

using namespace std;

#if defined(__i386__)
static __inline__ unsigned long long rdtsc(void)
{
    unsigned long long int x;
    __asm__ volatile (".byte 0x0f, 0x31" : "=A" (x));
    return x;
}
#elif defined(__x86_64__)
static __inline__ unsigned long long rdtsc(void)
{
    unsigned hi, lo;
    __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
    return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
}
#endif

#define CACHE_LINE_SIZE 64

#define WSS 24567 /* 24 Mb */
#define NUM_VARS WSS * 1024 / sizeof(long)

#define KHZ 3500000

// ./a.out memsize(in KB)
int main(int argc, char** argv)
{
    unsigned long wcet = atol(argv[1]);
    unsigned long mem_size_KB = 256;  // mem size in KB
    unsigned long mem_size_B  = mem_size_KB * 1024; // mem size in Byte
    unsigned long count       = mem_size_B / sizeof(long);
    unsigned long row         = mem_size_B / CACHE_LINE_SIZE;
    int           col         = CACHE_LINE_SIZE / sizeof(long);

    unsigned long long start, finish, dur1;
    unsigned long temp;

    long *buffer;
    buffer = new long[count];

    // init array
    for (unsigned long i = 0; i < count; ++i)
        buffer[i] = i;

    for (unsigned long i = row-1; i >0; --i) {
        temp = rand()%i;
        swap(buffer[i*col], buffer[temp*col]);
    }

    // warm the cache again
    temp = buffer[0];
    for (unsigned long i = 0; i < row-1; ++i) {
        temp = buffer[temp];
    }

    // First read, should be cache hit
    temp = buffer[0];
    start = rdtsc();
    int sum = 0;
    for(int wcet_i = 0; wcet_i < wcet; wcet_i++)
    {
        for(int j=0; j<21; j++)
        {
            for (unsigned long i = 0; i < row-1; ++i) {
                if (i%2 == 0) sum += buffer[temp];
                else sum -= buffer[temp];
                temp = buffer[temp];
            }
        }
    }
    finish = rdtsc();
    dur1 = finish-start;

    // Res
    printf("%lld %lld %.6f\n", dur1, dur1/row, dur1*1.0/KHZ);
    delete[] buffer;
    return 0;
}


推荐答案

使用RDTSC指令在虚拟机中很复杂。很可能管理程序(Xen)通过捕获RDTSC指令来模拟RDTSC指令。您的最快运行显示大约800个周期/缓存行,这是非常,非常,慢 - 唯一的解释是,RDTSC导致陷阱由管理程序处理,开销是一个性能瓶颈。我不知道你定期看到的更长的时间,但考虑到RDTSC正在被捕获,所有的定时下注都会关闭。

The use of the RDTSC instruction in a virtual machine is complicated. It is likely that the hypervisor (Xen) is emulating the RDTSC instruction by trapping it. Your fastest runs show around 800 cycles/cache line, which is very, very, slow... the only explanation is that the RDTSC results in a trap that is handled by the hypervisor, that overhead is a performance bottleneck. I'm not sure about the even longer time that you see periodically, but given that the RDTSC is being trapped, all timing bets are off.

您可以阅读更多

http://xenbits.xen.org/docs/4.2-testing/misc/tscmode.txt


rdtsc系列是非特权的,但
特权软件可能会设置一个cpuid位,使所有rdtsc系列
指令陷阱。这个陷阱可以被Xen检测到,它可以
然后透明地模拟rdtsc指令的结果和
返回控制到rdtsc指令之后的代码

Instructions in the rdtsc family are non-privileged, but privileged software may set a cpuid bit to cause all rdtsc family instructions to trap. This trap can be detected by Xen, which can then transparently "emulate" the results of the rdtsc instruction and return control to the code following the rdtsc instruction

顺便说一句,该文章是错误的,因为管理程序没有设置 cpuid位导致RDTSC陷阱,它是控制寄存器4(CR4.TSD)中的位#2:

By the way, that article is wrong in that the hypervisor doesn't set a cpuid bit to cause RDTSC to trap, it is bit #2 in Control Register 4 (CR4.TSD):

http://en.wikipedia.org/wiki/Control_register#CR4

这篇关于VM上的奇怪程序延迟行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆