MOVDQU指令+页面边界 [英] MOVDQU instruction + page boundary

查看:136
本文介绍了MOVDQU指令+页面边界的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的测试程序,该程序将xmm寄存器加载到 movdqu指令跨页面边界访问数据(操作系统= Linux).

I have a simple test program that loads an xmm register with the movdqu instruction accessing data across a page boundary (OS = Linux).

如果映射了以下页面,则可以正常工作.如果不是 映射后,我得到了SIGSEGV,这可能是预期的.

If the following page is mapped, this works just fine. If it's not mapped then I get a SIGSEGV, which is probably expected.

但是,这大大降低了未对准载荷的有用性 一点点.此外,SSE4.2指令(如pcmpistri)还可以 允许未对齐的内存引用似乎表现出此行为 也是

However this diminishes the usefulness of the unaligned loads quite a bit. Additionally SSE4.2 instructions (like pcmpistri) which allow for unaligned memory references appear to exhibit this behavior as well.

一切都很好-除了有很多strcmp的实现 我发现使用pcmpistri似乎无法解决此问题 完全-我已经能够设计出简单的测试用例 导致这些实现失败,而一次字节琐碎 在相同的数据布局下,strcmp实现也可以正常工作.

That's all fine -- except there's many an implementation of strcmp using pcmpistri that I've found that don't seem to address this issue at all -- and I've been able to contrive trivial testcases that will cause these implementations to fail, while the byte-at-a-time trivial strcmp implementation will work just fine with the same data layout.

更多说明-似乎是针对以下内容的GNU C库实现 64位Linux具有__strcmp_sse42变体,似乎使用了 pcmpistri指令以更安全的方式.实施 这个strcmp相当复杂,但是似乎正在仔细尝试 以避免页面边界问题.我不确定这是否是由于 我在上面描述的问题,或者仅仅是尝试的副作用 通过对齐数据获得更好的性能.

One more note -- it appears the the GNU C library implementation for 64-bit Linux has a __strcmp_sse42 variant that appears to use the pcmpistri instruction in a more safe manner. The implementation of this strcmp is fairly complex, but it appears to be carefully trying to avoid the page boundary issue. I'm not sure if that's due to the issue I describe above, or whether it's just a side-effect of trying to get better performance by aligning the data.

无论如何,我的主要问题是-在哪里可以找到更多信息 关于这个问题?我输入了"movdqu跨越页面边界",然后 我能想到的所有变种,但都没有遇到 任何特别有用的东西.如果有人可以向我指出更多信息 对此,将不胜感激.

Anyway the question I have is primarily -- where can I find out more about this issue? I've typed in "movdqu crossing page boundary" and every variant of that I can think of to Google, but haven't come across anything particularly useful. If anyone can point me to further info on this it would be greatly appreciated.

推荐答案

首先,任何尝试访问未映射地址的算法都将导致SegFault.如果非AVX代码流使用4字节负载来访问页面的最后一个字节和碰巧未映射的下一页"的前3个字节,那么这也会导致SegFault.不?我认为问题"是AVX(1/2/3)寄存器比典型"寄存器大得多,如果不安全地(但不愿接受)算法被简单地扩展到较大的寄存器,则会被捕获.

First, any algorithm which tries to access an unmapped address will cause a SegFault. If a non-AVX code flow used a 4 byte load to access the last byte of a page and the first 3 bytes of "the next page" which happened to not be mapped then it would also cause a SegFault. No? I believe that the "issue" is that the AVX(1/2/3) registers are so much bigger than "typical" that algorithms which were unsafe (but got away with it) get caught if they are trivially extended to the larger registers.

对齐负载(MOVDQA)永远不会出现此问题,因为它们不会越过自身大小或更大的边界.未对齐的负载可能会出现此问题(如您所述),并且经常"会出现此问题.这样做的原因是定义指令以加载目标寄存器的全部大小.您需要非常仔细地查看指令定义中的操作数类型.您对多少数据感兴趣无关紧要.定义指令要执行的操作也很重要.

Aligned loads (MOVDQA) can never have this problem since they don't cross any boundaries of their own size or greater. Unaligned loads CAN have this problem (as you've noted) and "often" do. The reason for this is that the instruction is defined to load the full size of the target register. You need to look at the operand types in the instruction definitions quite carefully. It doesn't matter how much of the data you are interested in. It matters what the instruction is defined to do.

但是...

AVX1(桑迪布里奇)添加了掩盖移动"功能,该功能比movdqa或movdqu慢,但只要未启用本来应该访问的部分的掩码,就不会(架构上)访问未映射的页面在该页面中.这是为了解决该问题.通常,向前看,似乎加载/存储的被屏蔽部分(请参阅AVX512)也不会导致IA上的访问冲突.

AVX1 (Sandybridge) added a "masked move" capability which is slower than a movdqa or movdqu but will not (architecturally) access the unmapped page so long as the mask is not enabled for the portion of the access which would have fallen in that page. This is meant to address the issue. In general, moving forward, it appears that masked portions (See AVX512) of loads/stores will not cause access violations on IA either.

(这对PCMPxSTRx的行为真是无耻.也许您可以在字符串"对象中添加15个字节的填充?)

(It is a bummer about PCMPxSTRx behavior. Perhaps you could add 15 bytes of padding to your "string" objects?)

这篇关于MOVDQU指令+页面边界的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆