如何从固定(锁页)RAM,而不是从CPU缓存读取(使用DMA零拷贝与GPU)? [英] How can I read from the pinned (lock-page) RAM, and not from the CPU cache (use DMA zero-copy with GPU)?

查看:184
本文介绍了如何从固定(锁页)RAM,而不是从CPU缓存读取(使用DMA零拷贝与GPU)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我在CUDA C ++上为RAM的GPU使用DMA,我如何确定内存将从固定(锁定页)RAM读取,而不是从CPU缓存读取?

If I use DMA for RAM <-> GPU on CUDA C++, How can I be sure that the memory will be read from the pinned (lock-page) RAM, and not from the CPU cache?

毕竟,使用DMA,CPU不知道有人改变了内存以及需要同步CPU(Cache - RAM)的事实。据我所知,std :: memory_barier()从C + +11不帮助DMA,不会从RAM读取,但只会导致缓存L1 / L2 / L3之间的兼容性。此外,一般来说,在CPU上的高速缓存和RAM之间没有协议解决冲突,但是在NUMA中只有同步协议不同级别的CPU高速缓存L1 / L2 / L3和多CPU:MOESI / MESIF

After all, with DMA, the CPU does not know anything about the fact that someone changed the memory and about the need to synchronize the CPU (Cache<->RAM). And as far as I know, std :: memory_barier () from C + +11 does not help with DMA and will not read from RAM, but only will result in compliance between the caches L1/L2/L3. Furthermore, in general, then there is no protocol to resolution conflict between cache and RAM on CPU, but only sync protocols different levels of CPU-cache L1/L2/L3 and multi-CPUs in NUMA: MOESI / MESIF

推荐答案

在x86上,CPU会监听总线流量,所以这不是一个问题。在Sandy Bridge类CPU上,PCI Express总线控制器集成在CPU中,因此CPU实际上可以从其L3缓存提供GPU读取,或者根据GPU的写入更新其缓存。

On x86, the CPU does snoop bus traffic, so this is not a concern. On Sandy Bridge class CPUs, the PCI Express bus controller is integrated into the CPU, so the CPU actually can service GPU reads from its L3 cache, or update its cache based on writes by the GPU.

这篇关于如何从固定(锁页)RAM,而不是从CPU缓存读取(使用DMA零拷贝与GPU)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆