C:编写可自动矢量化、嵌套循环、GCC 的代码 [英] C: Writing code which can be auto vectorized, nested loop, GCC

查看:37
本文介绍了C:编写可自动矢量化、嵌套循环、GCC 的代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一些可以矢量化的 C 代码.这是我正在尝试的循环:

I am trying to write some C code which can be vectorized. This is the loop I am trying:

for(jj=0;jj<params.nx;jj++)
    for(kk=0;kk<NSPEEDS;kk++)
        local_density_vec[jj] += tmp_cells_chunk[jj].speeds[kk];

GCC 在使用 -ftree-vectorizer-verbose=5 标志运行时给我以下消息 http://pastebin.com/RfCc04aS.

GCC gives me the following message when run with the -ftree-vectorizer-verbose=5 flag http://pastebin.com/RfCc04aS.

我如何重写它以便它可以自动矢量化.NSPEEDS 是 5.

How can I rewrite it in order that it can be auto vectorized. NSPEEDS is 5.

我一直在研究它,但我似乎无法使用 .speeds[kk] 对任何内容进行矢量化.有没有办法重组它以便它可以?

I've continued to work on it, and I don't seem to be able to vectorize anything with .speeds[kk]. Is there a way of restructuring it so that it can?

推荐答案

for (jj = 0; jj < nx; jj++) {
        partial = 0.0f;
        fp = c[jj].speeds;
        for (kk = 0; kk < M; kk++)
                partial += fp[kk];
        out[jj] = partial;
}
(...)
Calculated minimum iters for profitability: 12

36:   Profitability threshold = 11

Vectorizing loop at autovect.c:36

36: Profitability threshold is 11 loop iterations.
36: LOOP VECTORIZED.

要点:

1) 在您的转储中,循环被认为是复杂的访问模式"(请参阅​​日志的最后一行).正如已经评论过的,这与编译器无法验证别名有关.有关简单"访问模式,请参阅:http://gcc.gnu.org/projects/tree-ssa/vectorization.html#vectorizab

1) In your dump, the loop was considered "complicated access pattern" (see the last line of your log). As already commented, this is related to the compiler being unable to verify aliasing. For "simple" access patterns, see: http://gcc.gnu.org/projects/tree-ssa/vectorization.html#vectorizab

2) 我的示例循环需要 12 次迭代才能使矢量化有用.由于 NSPEEDS == 5,如果将您的矢量化,编译器会浪费时间.

2) My example loop required 12 iterations for vectorization to be useful. Since NSPEEDS == 5, the compiler would loose time if it vectorized yours.

3) 我只能在添加 -funsafe-math-optimizations 后对循环进行矢量化.我相信这是由于与结果向量操作不同的舍入或关联行为而需要的.参见,例如:http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems

3) I was only able to vectorize my loop after I added -funsafe-math-optimizations. I believe this is required due to either different rounding or associativity behavior with the resulting vector operations. See, for example: http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems

4) 如果您反转循环,您可能会再次遇到复杂"访问模式的问题.正如已经评论过的,您可能需要颠倒数组组织.查看有关跨步访问的 gcc 矢量化文档,以检查您是否可以匹配其中一种模式.

4) If you reverse the loop you could have problems with "complicated" access patterns again. As already commented, you may need to reverse the array organization. Check the gcc vectorization docs about strided accesses to check if you can match one of the patterns.

为了完整起见,以下是完整示例:http://pastebin.com/CWhyqUny

For completeness, here is the complete example: http://pastebin.com/CWhyqUny

这篇关于C:编写可自动矢量化、嵌套循环、GCC 的代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆