可以直接从gpu访问硬盘吗? [英] Is it possible to access hard disk directly from gpu?

查看:25
本文介绍了可以直接从gpu访问硬盘吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以直接从 GPU (CUDA/openCL) 访问硬盘/闪存盘并直接从 GPU 的内存加载/存储内容?

Is it possible to access hard disk/ flash disk directly from GPU (CUDA/openCL) and load/store content directly from the GPU's memory ?

我试图避免将内容从磁盘复制到内存,然后再将其复制到 GPU 的内存.

I am trying to avoid copying stuff from disk to memory and then copying it over to GPU's memory.

我阅读了有关 Nvidia GPUDirect 的信息,但不确定它是否符合我上面的解释.它讨论了远程 GPU 内存和磁盘,但在我的例子中,磁盘是 GPU 本地的.

I read about Nvidia GPUDirect but not sure if it does what I explained above. It talks about remote GPU memory and disks but the disks in my case are local to the GPU.

基本思想是加载内容(类似于 dma)-> 执行一些操作 -> 将内容存储回磁盘(再次以 dma 方式).

Basic idea is to load contents (something like dma) -> do some operations -> store contents back to disk (again in dma fashion).

我试图在这里尽可能少地涉及 CPU 和 RAM.

I am trying to involve CPU and RAM as little as possible here.

请随时提供有关设计的任何建议.

Please feel free to offer any suggestions about the design.

推荐答案

对于寻找这个的其他人来说,懒惰的取消固定"或多或少地完成了我想要的.

For anyone else looking for this, 'lazy unpinning' did more or less what I want.

查看以下内容是否对您有帮助.

Go through the following to see if this can be helpful for you.

为 GPUDirect 使用 RDMA 的最直接的实现是每次传输前固定内存并在传输后立即取消固定做完了.不幸的是,这通常会表现不佳,因为固定和取消固定内存是昂贵的操作.剩下的但是,可以执行执行 RDMA 传输所需的步骤快速不进入内核(DMA列表可以被缓存和使用 MMIO 寄存器/命令列表重放).

The most straightforward implementation using RDMA for GPUDirect would pin memory before each transfer and unpin it right after the transfer is complete. Unfortunately, this would perform poorly in general, as pinning and unpinning memory are expensive operations. The rest of the steps required to perform an RDMA transfer, however, can be performed quickly without entering the kernel (the DMA list can be cached and replayed using MMIO registers/command lists).

因此,延迟取消固定内存是高性能 RDMA 的关键执行.它的含义是保持内存固定不变传输完成后.这利用了以下事实:未来的 DMA 可能会使用相同的内存区域传输因此延迟取消固定节省了固定/取消固定操作.

Hence, lazily unpinning memory is key to a high performance RDMA implementation. What it implies, is keeping the memory pinned even after the transfer has finished. This takes advantage of the fact that it is likely that the same memory region will be used for future DMA transfers thus lazy unpinning saves pin/unpin operations.

延迟取消固定的示例实现将保持一组固定内存区域,并且只取消固定其中的一些(例如,最少最近使用的一个)如果区域的总大小达到了一些阈值,或者如果由于 BAR 空间而固定新区域失败耗尽(参见 PCI BAR 大小).

An example implementation of lazy unpinning would keep a set of pinned memory regions and only unpin some of them (for example the least recently used one) if the total size of the regions reached some threshold, or if pinning a new region failed because of BAR space exhaustion (see PCI BAR sizes).

这里是应用指南的链接nvidia 文档.

这篇关于可以直接从gpu访问硬盘吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆