假共享和pthread [英] False sharing and pthreads

查看:145
本文介绍了假共享和pthread的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下的任务来证明假共享,并写了一个简单的程序:

I have the following task to demonstrate false sharing and wrote a simple program:

#include <sys/times.h>
#include <time.h>
#include <stdio.h> 
#include <pthread.h> 

long long int tmsBegin1,tmsEnd1,tmsBegin2,tmsEnd2,tmsBegin3,tmsEnd3;

int array[100];

void *heavy_loop(void *param) { 
  int   index = *((int*)param);
  int   i;
  for (i = 0; i < 100000000; i++)
    array[index]+=3;
} 

int main(int argc, char *argv[]) { 
  int       first_elem  = 0;
  int       bad_elem    = 1;
  int       good_elem   = 32;
  long long time1;
  long long time2;
  long long time3;
  pthread_t     thread_1;
  pthread_t     thread_2;

  tmsBegin3 = clock();
  heavy_loop((void*)&first_elem);
  heavy_loop((void*)&bad_elem);
  tmsEnd3 = clock();

  tmsBegin1 = clock();
  pthread_create(&thread_1, NULL, heavy_loop, (void*)&first_elem);
  pthread_create(&thread_2, NULL, heavy_loop, (void*)&bad_elem);
  pthread_join(thread_1, NULL);
  pthread_join(thread_2, NULL);
  tmsEnd1 = clock(); 

  tmsBegin2 = clock();
  pthread_create(&thread_1, NULL, heavy_loop, (void*)&first_elem);
  pthread_create(&thread_2, NULL, heavy_loop, (void*)&good_elem);
  pthread_join(thread_1, NULL);
  pthread_join(thread_2, NULL);
  tmsEnd2 = clock();

  printf("%d %d %d\n", array[first_elem],array[bad_elem],array[good_elem]);
  time1 = (tmsEnd1-tmsBegin1)*1000/CLOCKS_PER_SEC;
  time2 = (tmsEnd2-tmsBegin2)*1000/CLOCKS_PER_SEC;
  time3 = (tmsEnd3-tmsBegin3)*1000/CLOCKS_PER_SEC;
  printf("%lld ms\n", time1);
  printf("%lld ms\n", time2);
  printf("%lld ms\n", time3);

  return 0; 
} 

我感到非常惊讶,当我看到的结果(我在我的酷睿i5-430M处理器,运行它)。

I was very surprised when I saw the results (I run it on my i5-430M processor).


  • 以虚假的共享,它是1020毫秒。

  • 无伪共享,这是710毫秒,只有30%的速度,而不是300%(这是写一些网站上,这将是速度超过300%-400%)。

  • 如果不使用pthreads的,它是580毫秒。

请告诉我我的错误或解释为什么会发生。

Please show me my mistake or explain why it happens.

推荐答案

错误共享是有独立的高速缓存访​​问物理内存的同一区域多个内核的结果(虽然不是同一个地址 - 这将是真正的共享)。

False sharing is a result of multiple cores with separate caches accessing the same region of physical memory (although not that same address -- that would be true sharing).

要了解假共享,你需要了解缓存。在大多数的处理器,每个核心都将拥有自己的L1高速缓存,其中包含最近访问的数据。高速缓存中的行,这是对齐数据,通常为32或64字节(取决于处理器)的块组织。当您从不在缓存中的地址读取,整条生产线是从主内存(或L2缓存)到L1阅读。当你写高速缓存中的一个地址,包含该地址的行被标记为脏。

To understand false sharing, you need to understand caches. In most processors, each core will have its own L1 cache, which holds recently accessed data. Caches are organized in "lines", which are aligned chunks of data, usually 32 or 64 bytes in length (depending on your processor). When you read from an address that's not in the cache, the whole line is read from main memory (or an L2 cache) into L1. When you write to an address in the cache, the line containing that address is marked "dirty".

在此处,共享方面进来,如果有多个芯从同一行读,它们可以各自具有在L1行的副本。然而,如果一个拷贝被标记为脏,则无效的其它高速缓存行。如果这没有发生,然后写在一个核心发可能不是别人内核可见直到很久以后。因此,下一次的其他核心的推移从该行中,高速缓存未命中来读取,并且它必须再次读取该行

Here's where the sharing aspect comes in. If multiple cores are reading from the same line, they can each have a copy of the line in L1. However, if a copy is marked dirty, it invalidates the line in the other caches. If this didn't happen, then writes made on one core might not be visible to others cores until much later. So next time the other core goes to read from that line, the cache misses, and it has to fetch the line again.

的当内核读,在同一行书写到不同的地址发生共享。即使他们没有共享数据,缓存像他们,因为他们是如此接近。

False sharing occurs when the cores are reading and writing to different addresses on the same line. Even though they are not sharing data, the caches act like they are since they are so close.

此效果是高度依赖于你的处理器的体系结构。如果你有一个单核处理器,你不会看到的效果都没有,因为就没有共享。如果您的高速缓存行是更长的时间,你会看到在这两个坏和好案件的影响,因为他们仍然紧贴在一起。如果你的内核没有共享L2缓存(这我猜他们这样做),你可能会看到300-400%的差距,就像你说的,因为他们将不得不走一路对高速缓存未命中主内存。

This effect is highly dependent on the architecture of your processor. If you had a single core processor, you would not see the effect at all, since there would be no sharing. If your cache lines were longer, you would see the effect in both the "bad" and "good" cases, since they are still close together. If your cores did not share an L2 cache (which I'm guessing they do), you might see 300-400% difference as you said, since they would have to go all the way to main memory on a cache miss.

您可能也想知道这是很重要的,每个线程读取和写入(+ =代替=)。某些处理器具有的写通的缓存,这意味着,如果一个核心写入地址不在缓存中,它不会错过,并从内存中取就行了。与此对比的回写的缓存,它做错过写。

You might also like to know that it's important that each thread is both reading and writing (+= instead of =). Some processors have write-through caches which means if a core writes to an address not in the cache, it doesn't miss and fetch the line from memory. Contrast this with write-back caches, which do miss on writes.

这篇关于假共享和pthread的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆