为什么不能使用`cudaHostRegister()`来标记为WriteCombined已经存在的内存区域? [英] Why can't we to mark as WriteCombined already existing memory region by using `cudaHostRegister()`?

查看:1052
本文介绍了为什么不能使用`cudaHostRegister()`来标记为WriteCombined已经存在的内存区域?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在CUDA SDK功能 cudaHostAlloc() 用于分配新内存区域可以使用标志:




  • cudaHostAllocDefault( default - 0,并导致cudaHostAlloc()模拟cudaMallocHost())

  • cudaHostAllocPortable

  • cudaHostAllocMapped

  • cudaHostAllocWriteCombined



要标记已分配的内存区域,可以使用 cudaHostRegister()

  • cudaHostRegisterPortable

  • cudaHostRegisterMapped



  • 为什么我们可以标记内存WriteCombined 通过使用 cudaHostAlloc(),但不能通过使用<$ c $标记为WriteCombined已经存在的内存区域 cudaHostAllocWriteCombined c> cudaHostRegister()



    已经分配的内存我们必须通过 POSIX =http://www.mjmwired.net/kernel/Documentation/x86/pat.txt#127 =nofollow> set_memory_wc()

    解决方案

    我不知道任何API可以改变现有VA范围的缓存性,直到引用 set_memory_wc()。这样的操作将是非常昂贵的,因为将需要所有的高速缓存刷新和TLB下拉;并且内存基本上是不可读的,直到你找到一些方法来取消它标记为WC。



    为什么你试图使用WC内存?在pre-i7(Nehalem)CPU上,WC具有略高的传输性能(IIRC),因为它抑制了PCI Express流量进出存储器的窥探。但是在Nehalem和更高版本的CPU上,我不知道有哪些应用程序具体体现了WC内存的好处。


    In CUDA SDK function cudaHostAlloc() for allocation new memory region can use flags:

    • cudaHostAllocDefault (default - 0 and causes cudaHostAlloc() to emulate cudaMallocHost())
    • cudaHostAllocPortable
    • cudaHostAllocMapped
    • cudaHostAllocWriteCombined

    To mark memory region that already allocated we can use cudaHostRegister() with flags:

    • 0 (default)
    • cudaHostRegisterPortable
    • cudaHostRegisterMapped

    Why we can mark memory WriteCombined when allocating it by flag cudaHostAllocWriteCombined by using cudaHostAlloc(), but can't mark as WriteCombined already existing memory region by using cudaHostRegister()?

    Already allocated memory we must will mark only through the POSIX function set_memory_wc()?

    解决方案

    I did not know of any APIs that could change the cacheability of an existing VA range until you referenced set_memory_wc(). Such an operation would be extremely expensive due to all the cache flushes and TLB shootdowns that would be required; and the memory would basically be unreadable until you found some way to unmark it as WC.

    Why are you trying to use WC memory? On pre-i7 (Nehalem) CPUs, WC had slightly higher transfer performance (IIRC) because it inhibited snooping of PCI Express traffic to and from the memory. But on Nehalem and later CPUs, I don't know of any application that has concretely demonstrated a benefit from WC memory.

    这篇关于为什么不能使用`cudaHostRegister()`来标记为WriteCombined已经存在的内存区域?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆