手动填充数组 [英] Padding array manually

查看:295
本文介绍了手动填充数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从,逻辑对我来说很清楚,但是我无法理解WIDTHP宏的计算,这是breif代码(原始代码长度超过300行!!):

I am trying to understand 9 point stencil's algorithm from this book , the logic is clear to me , but the calculation of WIDTHP macro is what i am unable to understand, here is the breif code (original code is more than 300 lines length!!):

#define PAD64 0
#define WIDTH 5900
#if PAD64
#define WIDTHP ((((WIDTH*sizeof(REAL))+63)/64)*(64/sizeof(REAL)))
#else
#define WIDTHP WIDTH
#endif
#define HEIGHT 10000

REAL *fa = (REAL *)malloc(sizeof(REAL)*WIDTHP*HEIGHT);
REAL *fb = (REAL *)malloc(sizeof(REAL)*WIDTHP*HEIGHT);

原始数组为5900 X 10000,但是如果我定义PAD64,则该数组 变成5915.75 X 10000

original array is 5900 X 10000, but if i define PAD64 , the array becomes 5915.75 X 10000

尽管到目前为止,我仍然可以猜到作者正在尝试调整&填充数组到64字节边界.但是malloc返回的数组通常是对齐的(&papped),而且 posix_memalign也可以为您提供保证具有要求的对齐方式的大量内存,我们也可以使用

Though so far i can guess that the author is trying to align & pad array to 64 byte boundary. But array returned by malloc is usually aligned(& padded) , also, the posix_memalign gives you a chunk of memory that is guaranteed to have the requested alignment , we can also use

__attribute__((align(64)))

此WIDTHP会对我的代码性能产生什么影响?

what impact does this WIDTHP can make on my code's performance?

推荐答案

由于他是对的,我打算以此作为解开答案的注释.但是也许我可以解释得更清楚,尽管字符数多于注释中的字符数.

I was going to put this in as a comment to unwind's answer because he's right. But perhaps I can explain more clearly, albeit in more characters than will fit in a comment.

当我做数学运算时,我得到5904实数,即23616字节,对于64字节高速缓存行,这是396高速缓存行.它是字节,而不是元素数,它必须是64的倍数.

When I do the math, I get 5904 reals, which is 23616 bytes, which is 396 cache lines for 64 byte cache lines. It is the bytes, rather than the number of elements which must be a multiple of 64.

关于为什么要填充width的值,让我们看一个较小的示例.假设我们有一个高速缓存行",其中包含10个字母,而我们有一个数组",其宽度为8个字母,高度为4.现在,由于我们的假设数组位于C中,C为行主行,因此该数组将看起来像像这样的东西: AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD

As to why you want to pad the value of width, lets look at a smaller example. Let's pretend we had a "cache line" that holds 10 letter and that we have an "array" with a width of 8 letters and height of 4. Now since our hypothetical array is in C and C is row major, the array will look something like this: AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD

,但是将其排列在缓存行中是什么样子,因为它们长10个字母: AAAAAAAABB BBBBBBCCCC CCCCDDDDDD DD

but what does it look like when it is arranged in cache lines, since those are 10 letters long: AAAAAAAABB BBBBBBCCCC CCCCDDDDDD DD

不好.仅数组的第一行对齐.但是,如果我们将宽度填充两个空格,我们将在缓存中得到它: AAAAAAAA__ BBBBBBBB__ CCCCCCCC__ DDDDDDDD__

Not good. Only the first row of the array is aligned. But if we pad width by two spaces, we get this in cache: AAAAAAAA__ BBBBBBBB__ CCCCCCCC__ DDDDDDDD__

这就是我们想要的.现在我们可以有一个像

which is what we want. Now we can have a nested loop like

for i = 1 to height
   for j = 1 to width

并且知道每次我们在j循环上开始工作时,所需的数据都会对齐.

and know that every time we start to work on the j loop, the data we need will be aligned.

哦,是的,他们确实应该做些事情来确保数组的第一个元素对齐. '属性((align(64)))''不起作用,因为数组是动态分配的,但是它们可以使用posix_memalign而不是malloc.

Oh, and yes, they really should do something to make sure that the first element of the array is aligned. 'attribute((align(64)))' won't work because the arrays are being allocated dynamically but they could have used posix_memalign instead of malloc.

这篇关于手动填充数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆