是否可以在用户空间中在 Linux 上分配不可缓存的内存块? [英] Is it possible to allocate, in user space, a non cacheable block of memory on Linux?

查看:25
本文介绍了是否可以在用户空间中在 Linux 上分配不可缓存的内存块?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的应用程序中有一堆缓冲区(25 到 30 个),它们相当大 (.5mb) 并且可以同时访问.更糟糕的是,其中的数据通常只读取一次,并且经常更新(例如每秒 30 次).某种非最佳缓存使用的完美风暴.

无论如何,我突然想到,如果我可以将一块内存标记为不可缓存的话会很酷……理论上,这会在缓存中为其他所有内容留出更多空间.

那么,他们是否有办法在 Linux 中获取标记为不可缓存的内存块?

解决方案

每个程序员都应该了解的关于内存的知识 (PDF) - 这是从 Red Hat 开发的角​​度编写的,非常适合您.但是,其中大部分是跨平台的.

您想要的是非临时访问",并告诉处理器期望您现在读取的值在一段时间内不再需要.然后处理器避免缓存该值.

请参阅我上面链接的 PDF 的第 49 页.它使用英特尔内在来围绕缓存进行流式传输.

<块引用>

在读取端,处理器,直到最近,除了缺乏支持使用非临时访问的弱提示(NTA) 预取指令.有不等同于 write-combining for阅读,这尤其不利于不可缓存的内存,例如内存映射 I/O.英特尔与SSE4.1 扩展,引入了 NTA负载.它们是使用一个少量流式加载缓冲区;每个缓冲区包含一个缓存线.第一条 movntdqa 指令对于给定的缓存行将加载一个缓存行到缓冲区中,可能替换另一个缓存行.随后的 16 字节对齐访问将使用相同的缓存行从负载缓冲区以很少的成本.除非有其他原因要做所以,缓存行不会被加载进入缓存,从而使加载大量内存不污染缓存.这编译器提供了一个内在的这条指令:

#include __m128i _mm_stream_load_si128 (__m128i *p);

<块引用>

这个内在函数应该多次使用,地址为16 字节块作为参数,直到每个缓存行读.只有这样,下一个缓存行开始.由于有几个流式读取缓冲区可能是可以从两个内存中读取一次定位

如果在读取时,缓冲区是通过内存以线性顺序读取的,那将是完美的.您可以使用流式读取来执行此操作.当您想要修改它们时,缓冲区会按线性顺序进行修改,如果您不希望很快从同一线程再次读取它们,您可以使用流式写入来执行此操作.

I have a bunch of buffers (25 to 30 of them) in my application that are are fairly large (.5mb) and accessed simulataneousley. To make it even worse the data in them is generally only read once, and it is updated frequently (like 30 times per second). Sort of the perfect storm of non optimal cache use.

Anyhow, it occurred to me that it would be cool if I could mark a block of memory as non cacheable... Theoretically, this would leave more room in the cache for everything else.

So, is their a way to get a block of memory marked as non cacheable in Linux?

解决方案

How to avoid polluting the caches with data like this is covered in What Every Programmer Should Know About Memory (PDF) - This is written from the perspective of Red Hat development so perfect for you. However, most of it is cross-platform.

What you want is called "Non-Temporal Access" and tell the processor to expect that the value you are reading now will not be needed again for a while. The processor then avoids caching that value.

See page 49 of the PDF I linked above. It uses the intel intrinsic to do the streaming around the cache.

On the read side, processors, until recently, lacked support aside from weak hints using non-temporal access (NTA) prefetch instructions. There is no equivalent to write-combining for reads, which is especially bad for uncacheable memory such as memory-mapped I/O. Intel, with the SSE4.1 extensions, introduced NTA loads. They are implemented using a small number of streaming load buffers; each buffer contains a cache line. The first movntdqa instruction for a given cache line will load a cache line into a buffer, possibly replacing another cache line. Subsequent 16-byte aligned accesses to the same cache line will be serviced from the load buffer at little cost. Unless there are other reasons to do so, the cache line will not be loaded into a cache, thus enabling the loading of large amounts of memory without polluting the caches. The compiler provides an intrinsic for this instruction:

#include <smmintrin.h>
__m128i _mm_stream_load_si128 (__m128i *p); 

This intrinsic should be used multiple times, with addresses of 16-byte blocks passed as the parameter, until each cache line is read. Only then should the next cache line be started. Since there are a few streaming read buffers it might be possible to read from two memory locations at once

It would be perfect for you if when reading, the buffers are read in linear order through memory. You use streaming reads to do so. When you want to modify them, the buffers are modified in linear order, and you can use streaming writes to do that if you don't expect to read them again any time soon from the same thread.

这篇关于是否可以在用户空间中在 Linux 上分配不可缓存的内存块?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆