测量线程的上下文切换时间 [英] Measuring context switch time for threads

查看:119
本文介绍了测量线程的上下文切换时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想计算上下文切换时间,我想使用互斥锁和条件变量在 2 个线程之间发出信号,以便一次只运行一个线程.我可以使用 CLOCK_MONOTONIC 来测量整个执行时间,并使用 CLOCK_THREAD_CPUTIME_ID 来测量每个线程运行的时间.
那么上下文切换时间就是(total_time - thread_1_time - thread_2_time).为了获得更准确的结果,我可以遍历它并取平均值.

I want to calculate the context switch time and I am thinking to use mutex and conditional variables to signal between 2 threads so that only one thread runs at a time. I can use CLOCK_MONOTONIC to measure the entire execution time and CLOCK_THREAD_CPUTIME_ID to measure how long each thread runs.
Then the context switch time is the (total_time - thread_1_time - thread_2_time). To get a more accurate result, I can just loop over it and take the average.

这是近似上下文切换时间的正确方法吗?我想不出任何可能出错的地方,但我得到的答案不到 1 纳秒..

Is this a correct way to approximate the context switch time? I cant think of anything that might go wrong but I am getting answers that are under 1 nanosecond..

我忘了说,循环次数越多,取平均值的时间越长,得到的结果越小.

I forgot to mention that the more time I loop it over and take the average, the smaller results I get.

编辑

这是我拥有的代码片段

    typedef struct
    {
      struct timespec start;
      struct timespec end;
    }thread_time;

    ...


    // each thread function looks similar like this
    void* thread_1_func(void* time)
    {
       thread_time* thread_time = (thread_time*) time;

       clock_gettime(CLOCK_THREAD_CPUTIME_ID, &(thread_time->start)); 
       for(x = 0; x < loop; ++x)
       {
         //where it switches to another thread
       }
       clock_gettime(CLOCK_THREAD_CPUTIME_ID, &(thread_time->end));

       return NULL;
   };

   void* thread_2_func(void* time)
   {
      //similar as above
   }

   int main()
   {
      ...
      pthread_t thread_1;
      pthread_t thread_2;

      thread_time thread_1_time;
      thread_time thread_2_time;

      struct timespec start, end;

      // stamps the start time 
      clock_gettime(CLOCK_MONOTONIC, &start);

      // create two threads with the time structs as the arguments 
      pthread_create(&thread_1, NULL, &thread_1_func, (void*) &thread_1_time);
      pthread_create(&thread_2, NULL, &thread_2_func, (void*) &thread_2_time); 
      // waits for the two threads to terminate 
      pthread_join(thread_1, NULL);
      pthread_join(thread_2, NULL);

      // stamps the end time 
      clock_gettime(CLOCK_MONOTONIC, &end);

      // then I calculate the difference between between total execution time and the total execution time of two different threads..
   }

推荐答案

首先,使用CLOCK_THREAD_CPUTIME_ID 可能是非常错误的;这个时钟将给出那个线程在用户模式中花费的时间.但是上下文切换不会在用户模式下发生,您需要使用另一个时钟.此外,在多处理系统上,时钟可以为不同的处理器提供不同的值!因此我建议你使用 CLOCK_REALTIMECLOCK_MONOTONIC 代替.但是请注意,即使您快速连续读取其中任何一个,时间戳通常也会相隔数十纳秒.

First of all, using CLOCK_THREAD_CPUTIME_ID is probably very wrong; this clock will give the time spent in that thread, in user mode. However the context switch does not happen in user mode, You'd want to use another clock. Also, on multiprocessing systems the clocks can give different values from processor to another! Thus I suggest you use CLOCK_REALTIME or CLOCK_MONOTONIC instead. However be warned that even if you read either of these twice in rapid succession, the timestamps usually will tens of nanoseconds apart already.

至于上下文切换——上下文切换有很多种.最快的方法是完全在软件中从一个线程切换到另一个线程.这只是意味着您将旧寄存器压入堆栈,设置任务切换标志,以便延迟保存 SSE/FP 寄存器,保存堆栈指针,加载新堆栈指针并从该函数返回——因为另一个线程也做了同样的事情,该函数的返回发生在另一个线程中.

As for context switches - tthere are many kinds of context switches. The fastest approach is to switch from one thread to another entirely in software. This just means that you push the old registers on stack, set task switched flag so that SSE/FP registers will be lazily saved, save stack pointer, load new stack pointer and return from that function - since the other thread had done the same, the return from that function happens in another thread.

这个线程到线程的切换非常快,它的开销与任何系统调用的开销大致相同.从一个进程切换到另一个进程要慢得多:这是因为必须通过设置 CR0 寄存器来刷新和切换用户空间页表;这会导致 TLB 丢失,将虚拟地址映射到物理地址.

This thread to thread switch is quite fast, its overhead is about the same as for any system call. Switching from one process to another is much slower: this is because the user-space page tables must be flushed and switched by setting the CR0 register; this causes misses in TLB, which maps virtual addresses to physical ones.

然而,<1 ns 上下文切换/系统调用开销似乎并不真实——这里很可能存在超线程或 2 个 CPU 内核,因此我建议您在该进程上设置 CPU 关联性,以便Linux 只在第一个 CPU 内核上运行它:

However the <1 ns context switch/system call overhead does not really seem plausible - it is very probable that there is either hyperthreading or 2 CPU cores here, so I suggest that you set the CPU affinity on that process so that Linux only ever runs it on say the first CPU core:

#include <sched.h>

cpu_set_t  mask;
CPU_ZERO(&mask);
CPU_SET(0, &mask);
result = sched_setaffinity(0, sizeof(mask), &mask);

那么您应该非常确定您测量的时间来自真实的上下文切换.此外,要测量切换浮点/SSE 堆栈的时间(这种情况会延迟发生),您应该有一些浮点变量并在上下文切换之前对它们进行计算,然后将 say .1 添加到一些volatile 浮点变量在上下文切换之后,看看它是否对切换时间有影响.

Then you should be pretty sure that the time you're measuring comes from a real context switch. Also, to measure the time for switching floating point / SSE stacks (this happens lazily), you should have some floating point variables and do calculations on them prior to context switch, then add say .1 to some volatile floating point variable after the context switch to see if it has an effect on the switching time.

这篇关于测量线程的上下文切换时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆