在具有constant_tsc和nonstop_tsc的CPU上,为什么我的时间漂移​​了? [英] On a cpu with constant_tsc and nonstop_tsc, why does my time drift?

查看:319
本文介绍了在具有constant_tsc和nonstop_tsc的CPU上,为什么我的时间漂移​​了?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用constant_tscnonstop_tsc

$ grep -m 1 ^flags /proc/cpuinfo | sed 's/ /\n/g' | egrep "constant_tsc|nonstop_tsc"
constant_tsc
nonstop_tsc

第1步::计算tsc的变动率:

Step 1: Calculate the tick rate of the tsc:

我将_ticks_per_ns计算为许多观察值的中位数.我使用rdtscp来确保按顺序执行.

I calculate _ticks_per_ns as the median over a number of observations. I use rdtscp to ensure in-order execution.

static const int trials = 13;
std::array<double, trials> rates;

for (int i = 0; i < trials; ++i)
{
    timespec beg_ts, end_ts;
    uint64_t beg_tsc, end_tsc;

    clock_gettime(CLOCK_MONOTONIC, &beg_ts);
    beg_tsc = rdtscp();

    uint64_t elapsed_ns;
    do
    {
        clock_gettime(CLOCK_MONOTONIC, &end_ts);
        end_tsc = rdtscp();

        elapsed_ns = to_ns(end_ts - beg_ts); // calculates ns between two timespecs
    }
    while (elapsed_ns < 10 * 1e6); // busy spin for 10ms

    rates[i] = (double)(end_tsc - beg_tsc) / (double)elapsed_ns;
}

std::nth_element(rates.begin(), rates.begin() + trials/2, rates.end());

_ticks_per_ns = rates[trials/2];

步骤2:计算开始的挂钟时间和tsc

Step 2: Calculate starting wall clock time and tsc

uint64_t beg, end;
timespec ts;

// loop to ensure we aren't interrupted between the two tsc reads
while (1)
{
    beg = rdtscp();
    clock_gettime(CLOCK_REALTIME, &ts);
    end = rdtscp();

    if ((end - beg) <= 2000) // max ticks per clock call
        break;
}

_start_tsc        = end;
_start_clock_time = to_ns(ts); // converts timespec to ns since epoch

步骤3:创建一个可以从tsc返回挂钟时间的函数

Step 3: Create a function which can return wall clock time from the tsc

uint64_t tsc_to_ns(uint64_t tsc)
{
    int64_t diff = tsc - _start_tsc;
    return _start_clock_time + (diff / _ticks_per_ns);
}

步骤4:循环运行,从clock_gettimerdtscp

// lock the test to a single core
cpu_set_t mask;
CPU_ZERO(&mask);
CPU_SET(6, &mask);
sched_setaffinity(0, sizeof(cpu_set_t), &mask);

while (1)
{
    timespec utc_now;
    clock_gettime(CLOCK_REALTIME, &utc_now);
    uint64_t utc_ns = to_ns(utc_now);
    uint64_t tsc_ns = tsc_to_ns(rdtscp());

    uint64_t ns_diff = tsc_ns - utc_ns;

    std::cout << "clock_gettime " << ns_to_str(utc_ns) << '\n';
    std::cout << "tsc_time      " << ns_to_str(tsc_ns) << " diff=" << ns_diff << "ns\n";

    sleep(10);
}

输出:

clock_gettime 11:55:34.824419837
tsc_time      11:55:34.824419840 diff=3ns
clock_gettime 11:55:44.826260245
tsc_time      11:55:44.826260736 diff=491ns
clock_gettime 11:55:54.826516358
tsc_time      11:55:54.826517248 diff=890ns
clock_gettime 11:56:04.826683578
tsc_time      11:56:04.826684672 diff=1094ns
clock_gettime 11:56:14.826853056
tsc_time      11:56:14.826854656 diff=1600ns
clock_gettime 11:56:24.827013478
tsc_time      11:56:24.827015424 diff=1946ns

问题:

很快很明显,用这两种方式计算出的时间会迅速漂移.

It is quickly evident that the times calculated in these two ways rapidly drift apart.

我假设对于constant_tscnonstop_tsc,tsc速率是恒定的.

I'm assuming that with constant_tsc and nonstop_tsc that the tsc rate is constant.

  • 这是正在漂移的车载时钟吗?当然不会以这种速度漂移吗?

  • Is this the on board clock that is drifting? Surely it doesn't drift at this rate?

这种漂移的原因是什么?

What is the cause of this drift?

有什么我可以做的来保持它们同步(除了在步骤2中非常频繁地重新计算_start_tsc_start_clock_time之外)?

Is there anything I can do to keep them in sync (other than very frequently recalculating _start_tsc and _start_clock_time in step 2)?

推荐答案

在OP中看到漂移的原因,至少在我的机器上,是因为每ns的TSC滴答偏离了其原始值_ticks_per_ns .以下是来自此计算机的结果:

The reason for the drift seen in the OP, at least on my machine, is that the TSC ticks per ns drifts away from its original value of _ticks_per_ns. The following results were from this machine:

don@HAL:~/UNIX/OS/3EZPcs/Ch06$ uname -a
Linux HAL 4.4.0-81-generic #104-Ubuntu SMP Wed Jun 14 08:17:06 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
don@HAL:~/UNIX/OS/3EZPcs/Ch06$  cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc

cat /proc/cpuinfo显示constant_tscnonstop_tsc标志.

viewRates.cc可以运行以查看计算机上每ns的当前TSC刻度:

viewRates.cc can be run to see the current TSC Ticks per ns on a machine:

rdtscp.h:

static inline unsigned long rdtscp_start(void) {
  unsigned long var;
  unsigned int hi, lo;

  __asm volatile ("cpuid\n\t"
          "rdtsc\n\t" : "=a" (lo), "=d" (hi)
          :: "%rbx", "%rcx");

  var = ((unsigned long)hi << 32) | lo;
  return (var);
}

static inline unsigned long rdtscp_end(void) {
  unsigned long var;
  unsigned int hi, lo;

  __asm volatile ("rdtscp\n\t"
          "mov %%edx, %1\n\t"
          "mov %%eax, %0\n\t"
          "cpuid\n\t"  : "=r" (lo), "=r" (hi)
          :: "%rax", "%rbx", "%rcx", "%rdx");

  var = ((unsigned long)hi << 32) | lo;
  return (var);
  }

/*see https://www.intel.com/content/www/us/en/embedded/training/ia-32-ia-64-benchmark-code-execution-paper.html
 */

viewRates.cc:

viewRates.cc:

#include <time.h>
#include <unistd.h>
#include <iostream>
#include <iomanip>
#include <cstdlib>
#include "rdtscp.h"
using std::cout;  using std::cerr;  using std::endl;

#define CLOCK CLOCK_REALTIME

uint64_t to_ns(const timespec &ts);   // Converts a struct timespec to ns (since epoch).
void view_ticks_per_ns(int runs =10, int sleep =10);

int main(int argc, char **argv) {
  int runs = 10, sleep = 10;
  if (argc != 1 && argc != 3) {
    cerr << "Usage: " << argv[0] << " [ RUNS SLEEP ] \n";
    exit(1);
  } else if (argc == 3) {
    runs = std::atoi(argv[1]);
    sleep = std::atoi(argv[2]);
  }

  view_ticks_per_ns(runs, sleep); 
}

  void view_ticks_per_ns(int RUNS, int SLEEP) {
// Prints out stream of RUNS tsc ticks per ns, each calculated over a SLEEP secs interval.
  timespec clock_start, clock_end;
  unsigned long tsc1, tsc2, tsc_start, tsc_end;
  unsigned long elapsed_ns, elapsed_ticks;
  double rate; // ticks per ns from each run.

  clock_getres(CLOCK, &clock_start);
  cout <<  "Clock resolution: " << to_ns(clock_start) << "ns\n\n";

  cout << " tsc ticks      " << "ns      " << " tsc ticks per ns\n";
  for (int i = 0; i < RUNS; ++i) {
    tsc1 = rdtscp_start();
    clock_gettime(CLOCK, &clock_start);
    tsc2 = rdtscp_end();                      
    tsc_start = (tsc1 + tsc2) / 2;

    sleep(SLEEP);

    tsc1 = rdtscp_start();
    clock_gettime(CLOCK, &clock_end);
    tsc2 = rdtscp_end();                     
    tsc_end = (tsc1 + tsc2) / 2;

    elapsed_ticks = tsc_end - tsc_start;
    elapsed_ns = to_ns(clock_end) - to_ns(clock_start);
    rate = static_cast<double>(elapsed_ticks) / elapsed_ns;

    cout << elapsed_ticks << " " << elapsed_ns << " " << std::setprecision(12) << rate << endl;
  } 
}

linearExtrapolator.cc可以运行以重新创建OP的实验:

linearExtrapolator.cc can be run to re-create the experiment of the OP:

linearExtrapolator.cc:

linearExtrapolator.cc:

#include <time.h>
#include <unistd.h>
#include <iostream>
#include <iomanip>
#include <algorithm>
#include <array>
#include "rdtscp.h"

using std::cout;  using std::endl;  using std::array;

#define CLOCK CLOCK_REALTIME

uint64_t to_ns(const timespec &ts);   // Converts a struct timespec to ns (since epoch).
void set_ticks_per_ns(bool set_rate); // Display or set tsc ticks per ns, _ticks_per_ns.
void get_start();             // Sets the 'start' time point: _start_tsc[in ticks] and _start_clock_time[in ns].
uint64_t tsc_to_ns(uint64_t tsc);     // Convert tsc ticks since _start_tsc to ns (since epoch) linearly using
                                      // _ticks_per_ns with origin(0) at the 'start' point set by get_start().

uint64_t _start_tsc, _start_clock_time; // The 'start' time point as both tsc tick number, start_tsc, and as
                                        // clock_gettime ns since epoch as _start_clock_time.
double _ticks_per_ns;                   // Calibrated in set_ticks_per_ns()

int main() {
  set_ticks_per_ns(true); // Set _ticks_per_ns as the initial TSC ticks per ns.

  uint64_t tsc1, tsc2, tsc_now, tsc_ns, utc_ns;
  int64_t ns_diff;
  bool first_pass{true};
  for (int i = 0; i < 10; ++i) {
    timespec utc_now;
    if (first_pass) {
      get_start(); //Get start time in both ns since epoch (_start_clock_time), and tsc tick number(_start_tsc)
      cout << "_start_clock_time: " <<  _start_clock_time << ", _start_tsc: " << _start_tsc << endl;
      utc_ns = _start_clock_time;
      tsc_ns = tsc_to_ns(_start_tsc);   // == _start_clock_time by definition.
      tsc_now = _start_tsc;
      first_pass = false;
    } else {
      tsc1 = rdtscp_start();
      clock_gettime(CLOCK, &utc_now);
      tsc2 = rdtscp_end();
      tsc_now = (tsc1 + tsc2) / 2;
      tsc_ns = tsc_to_ns(tsc_now);
      utc_ns = to_ns(utc_now);
    }

    ns_diff = tsc_ns - (int64_t)utc_ns;

    cout << "elapsed ns: " << utc_ns - _start_clock_time << ", elapsed ticks: " << tsc_now - _start_tsc 
     << ", ns_diff: " << ns_diff << '\n' << endl;

    set_ticks_per_ns(false);  // Display current TSC ticks per ns (does not alter original _ticks_per_ns).
  }
}

void set_ticks_per_ns(bool set_rate) {
  constexpr int RUNS {1}, SLEEP{10};
  timespec clock_start, clock_end;
  uint64_t tsc1, tsc2, tsc_start, tsc_end;
  uint64_t elapsed_ns[RUNS], elapsed_ticks[RUNS];
  array<double, RUNS> rates; // ticks per ns from each run.

  if (set_rate) {
    clock_getres(CLOCK, &clock_start);
    cout <<  "Clock resolution: " << to_ns(clock_start) << "ns\n";
  }

  for (int i = 0; i < RUNS; ++i) {
    tsc1 = rdtscp_start();
    clock_gettime(CLOCK, &clock_start);
    tsc2 = rdtscp_end();                      
    tsc_start = (tsc1 + tsc2) / 2;

    sleep(SLEEP);

    tsc1 = rdtscp_start();
    clock_gettime(CLOCK, &clock_end);
    tsc2 = rdtscp_end();                     
    tsc_end = (tsc1 + tsc2) / 2;

    elapsed_ticks[i] = tsc_end - tsc_start;
    elapsed_ns[i] = to_ns(clock_end) - to_ns(clock_start);
    rates[i] = static_cast<double>(elapsed_ticks[i]) / elapsed_ns[i];
  }

  cout << " tsc ticks      " << "ns     " << "tsc ticks per ns" << endl;
  for (int i = 0; i < RUNS; ++i)
    cout << elapsed_ticks[i] << " " << elapsed_ns[i] << " " << std::setprecision(12) << rates[i] << endl;

  if (set_rate)
    _ticks_per_ns = rates[RUNS-1];
}

constexpr uint64_t BILLION {1000000000};

uint64_t to_ns(const timespec &ts) {
  return ts.tv_sec * BILLION + ts.tv_nsec;
}

void get_start() { // Get start time both in tsc ticks as _start_tsc, and in ns since epoch as _start_clock_time
  timespec ts;
  uint64_t beg, end;

// loop to ensure we aren't interrupted between the two tsc reads
  while (1) {
    beg = rdtscp_start();
    clock_gettime(CLOCK, &ts);
    end = rdtscp_end();   
    if ((end - beg) <= 2000) // max ticks per clock call
      break;
  }

  _start_tsc = (end + beg) / 2;
  _start_clock_time = to_ns(ts); // converts timespec to ns since epoch
}

uint64_t tsc_to_ns(uint64_t tsc) { // Convert tsc ticks into absolute ns:
  // Absolute ns is defined by this linear extrapolation from the start point where
  //_start_tsc[in ticks] corresponds to _start_clock_time[in ns].
  uint64_t diff = tsc - _start_tsc;
  return _start_clock_time + static_cast<uint64_t>(diff / _ticks_per_ns);
}

在运行viewRates后立即输出linearExtrapolator:

don@HAL:~/UNIX/OS/3EZPcs/Ch06$ ./viewRates 
Clock resolution: 1ns

 tsc ticks      ns       tsc ticks per ns
28070466526 10000176697 2.8069970538
28070500272 10000194599 2.80699540335
28070489661 10000196097 2.80699392179
28070404159 10000170879 2.80699245029
28070464811 10000197285 2.80699110338
28070445753 10000195177 2.80698978932
28070430538 10000194298 2.80698851457
28070427907 10000197673 2.80698730414
28070409903 10000195492 2.80698611597
28070398177 10000195328 2.80698498942
don@HAL:~/UNIX/OS/3EZPcs/Ch06$ ./linearExtrapolator
Clock resolution: 1ns
 tsc ticks      ns     tsc ticks per ns
28070385587 10000197480 2.8069831264
_start_clock_time: 1497966724156422794, _start_tsc: 4758879747559
elapsed ns: 0, elapsed ticks: 0, ns_diff: 0

 tsc ticks      ns     tsc ticks per ns
28070364084 10000193633 2.80698205596
elapsed ns: 10000247486, elapsed ticks: 28070516229, ns_diff: -3465

 tsc ticks      ns     tsc ticks per ns
28070358445 10000195130 2.80698107188
elapsed ns: 20000496849, elapsed ticks: 56141027929, ns_diff: -10419

 tsc ticks      ns     tsc ticks per ns
28070350693 10000195646 2.80698015186
elapsed ns: 30000747550, elapsed ticks: 84211534141, ns_diff: -20667

 tsc ticks      ns     tsc ticks per ns
28070324772 10000189692 2.80697923105
elapsed ns: 40000982325, elapsed ticks: 112281986547, ns_diff: -34158

 tsc ticks      ns     tsc ticks per ns
28070340494 10000198352 2.80697837242
elapsed ns: 50001225563, elapsed ticks: 140352454025, ns_diff: -50742

 tsc ticks      ns     tsc ticks per ns
28070325598 10000196057 2.80697752704
elapsed ns: 60001465937, elapsed ticks: 168422905017, ns_diff: -70335

^C

viewRates输出表明,与上图中那些陡峭下降之一相对应的时间,每ns的TSC滴答声正在迅速下降.与OP中一样,linearExtrapolator输出显示clock_gettime()报告的经过ns与通过使用开始时间获得的_ticks_per_ns == 2.8069831264将经过TSC滴答转换为经过ns所获得的经过ns之差. .我不是使用elapsed nselapsed ticksns_diff进行每次打印之间使用sleep(10);,而是使用10s窗口重新运行每ns计算的TSC滴答.这会打印出当前的tsc ticks per ns比率.可以看出,在linearExtrapolator的整个运行过程中,从viewRates的输出观察到的每ns TSC滴答声下降的趋势仍在继续.

The viewRates output shows that the TSC ticks per ns are decreasing fairly rapidly with time corresponding to one of those steep drops in the plot above. The linearExtrapolator output shows, as in the OP, the difference between the elapsed ns as reported by clock_gettime(), and the elapsed ns obtained by converting the elapsed TSC ticks to elapsed ns using _ticks_per_ns == 2.8069831264 obtained at start time. Rather than a sleep(10); between each print out of elapsed ns, elapsed ticks, ns_diff, I re-run the TSC ticks per ns calculation using a 10s window; this prints out the current tsc ticks per ns ratio. It can be seen that the trend of decreasing TSC ticks per ns observed from the viewRates output is continuing throughout the run of linearExtrapolator.

elapsed ticks除以_ticks_per_ns并减去相应的elapsed ns可得到ns_diff,例如:(84211534141/2.8069831264)-30000747550 = -20667.但这不是0,主要是由于TSC滴答/ns的漂移.如果我们使用从最近的10s间隔中获得的ns值为2.80698015186 ticks,则结果将是:(84211534141/2.80698015186)-30000747550 =11125.在该最后的10s间隔中累积的附加错误-20667--10419 =- 10248,当在该间隔中使用正确的每秒ns的TSC滴答值时,几乎消失:(84211534141-56141027929)/2.80698015186-(30000747550-20000496849)= 349.

Dividing an elapsed ticks by _ticks_per_ns and subtracting the corresponding elapsed ns gives the ns_diff, e.g.: (84211534141 / 2.8069831264) - 30000747550 = -20667. But this is not 0 mainly due the drift in TSC ticks per ns. If we had used a value of 2.80698015186 ticks per ns obtained from the last 10s interval, the result would be: (84211534141 / 2.80698015186) - 30000747550 = 11125. The additional error accumulated during that last 10s interval, -20667 - -10419 = -10248, nearly disappears when the correct TSC ticks per ns value is used for that interval: (84211534141 - 56141027929) / 2.80698015186 - (30000747550 - 20000496849) = 349.

如果linearExtrapolator是在每ns的TSC滴答数保持恒定的时间运行的,则精度将受到确定(恒定)_ticks_per_ns的程度的限制,然后需要付出一定的代价,例如,是几个估算值的中位数.如果_ticks_per_ns偏离固定的十亿分之40,那么每10秒将出现约400ns的恒定漂移,因此ns_diff将每10秒增加/缩小400.

If the linearExtrapolator had been run at a time when the TSC ticks per ns had been constant, the accuracy would be limited by how well the (constant) _ticks_per_ns had been determined, and then it would pay to take, e.g., a median of several estimates. If the _ticks_per_ns was off by a fixed 40 parts per billion, a constant drift of about 400ns every 10 seconds would be expected, so ns_diff would grow/shrink by 400 each 10 seconds.

genTimeSeriesofRates.cc可用于为上述绘图生成数据: genTimeSeriesofRates.cc:

genTimeSeriesofRates.cc can be used to generate data for a plot like above: genTimeSeriesofRates.cc:

#include <time.h>
#include <unistd.h>
#include <iostream>
#include <iomanip>
#include <algorithm>
#include <array>
#include "rdtscp.h"

using std::cout;  using std::cerr;  using std::endl;  using std::array;

double get_ticks_per_ns(long &ticks, long &ns); // Get median tsc ticks per ns, ticks and ns.
long ts_to_ns(const timespec &ts);

#define CLOCK CLOCK_REALTIME            // clock_gettime() clock to use.
#define TIMESTEP 10
#define NSTEPS  10000
#define RUNS 5            // Number of RUNS and SLEEP interval used for each sample in get_ticks_per_ns().
#define SLEEP 1

int main() {
  timespec ts;
  clock_getres(CLOCK, &ts);
  cerr << "CLOCK resolution: " << ts_to_ns(ts) << "ns\n";

  clock_gettime(CLOCK, &ts);
  int start_time = ts.tv_sec;

  double ticks_per_ns;
  int running_elapsed_time = 0; //approx secs since start_time to center of the sampling done by get_ticks_per_ns()
  long ticks, ns;
  for (int timestep = 0; timestep < NSTEPS; ++timestep) {
    clock_gettime(CLOCK, &ts);
    ticks_per_ns = get_ticks_per_ns(ticks, ns);
    running_elapsed_time = ts.tv_sec - start_time + RUNS * SLEEP / 2;

    cout << running_elapsed_time << ' ' << ticks << ' ' << ns << ' ' 
     << std::setprecision(12) << ticks_per_ns << endl;

    sleep(10);
  }
}

double get_ticks_per_ns(long &ticks, long &ns) {
  // get the median over RUNS runs of elapsed tsc ticks, CLOCK ns, and their ratio over a SLEEP secs time interval 
  timespec clock_start, clock_end;
  long tsc_start, tsc_end;
  array<long, RUNS> elapsed_ns, elapsed_ticks;
  array<double, RUNS> rates; // arrays from each run from which to get medians.

  for (int i = 0; i < RUNS; ++i) {
    clock_gettime(CLOCK, &clock_start);
    tsc_start = rdtscp_end(); // minimizes time between clock_start and tsc_start.
    sleep(SLEEP);
    clock_gettime(CLOCK, &clock_end);
    tsc_end = rdtscp_end();

    elapsed_ticks[i] = tsc_end - tsc_start;
    elapsed_ns[i] = ts_to_ns(clock_end) - ts_to_ns(clock_start);
    rates[i] = static_cast<double>(elapsed_ticks[i]) / elapsed_ns[i];
  }

  // get medians:
  std::nth_element(elapsed_ns.begin(), elapsed_ns.begin() + RUNS/2, elapsed_ns.end());
  std::nth_element(elapsed_ticks.begin(), elapsed_ticks.begin() + RUNS/2, elapsed_ticks.end());
  std::nth_element(rates.begin(), rates.begin() + RUNS/2, rates.end());
  ticks = elapsed_ticks[RUNS/2];
  ns = elapsed_ns[RUNS/2];

  return rates[RUNS/2];
}

constexpr long BILLION {1000000000};

long ts_to_ns(const timespec &ts) {
  return ts.tv_sec * BILLION + ts.tv_nsec;
}

这篇关于在具有constant_tsc和nonstop_tsc的CPU上,为什么我的时间漂移​​了?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆