GPU - 系统内存映射 [英] GPU - System memory mapping

查看:46
本文介绍了GPU - 系统内存映射的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何为 GPU 访问映射系统内存 (RAM)?我很清楚虚拟内存如何为 cpu 工作,但不确定当 GPU 访问 GPU 映射的系统内存(主机)时如何工作.基本上与数据如何从系统内存复制到主机内存有关,反之亦然.你能提供参考文章支持的解释吗?

How system memory (RAM) is mapped for GPU access? I am clear about how virtual memory works for cpu but am not sure how would that work when GPU accesses GPU-mapped system memory (host). Basically something related to how Data is copied from system-memory to host-memory and vice versa. Can you provide explanations backed by reference articles please?

推荐答案

我发现以下幻灯片非常有用:http://developer.amd.com/afds/assets/presentations/1004_final.pdf

I found the following slideset quite useful: http://developer.amd.com/afds/assets/presentations/1004_final.pdf

FUSION APUS 上的记忆系统零拷贝的好处皮埃尔·布迪埃AMDOpenGL/OpenCL 研究员格雷厄姆·塞勒斯AMDOpenGL 经理

MEMORY SYSTEM ON FUSION APUS The Benefits of Zero Copy Pierre Boudier AMD Fellow of OpenGL/OpenCL Graham Sellers AMD Manager of OpenGL

AMD Fusion 开发者峰会 2011 年 6 月

AMD Fusion Developer Summit June 2011

但是请注意,这是一个快速移动的区域.与其说是开发新概念,不如说是(最终)将虚拟内存等概念应用于 GPU.让我总结一下.

Be aware, however, this is a fast moving area. Not so much developing new concepts, as in (finally) applying concepts like virtual memory to GPUs. Let me summarize.

在过去,例如 2010 年之前,GPU 通常是单独的 PCI 或 PCI-express 卡或板.他们的 GPU 卡上有一些 DRAM.这种板载 DRAM 速度非常快.他们还可以访问 CPU 端的 DRAM,通常是通过跨 PCI 的 DMA 复制引擎.GPU 像这样访问 CPU 内存通常很慢.

In the old days, say prior to 2010, GPUs usually were separate PCI or PCI-excpress cards or boards. They had some DRAM on board the GPU card. This on-board DRAM is pretty fast. They could also access DRAM on the CPU side, typically via DMA copy engines across PCI. GPU access to CPU memory like this is usually quite slow.

GPU 内存未分页.就此而言,GPU 内存通常是未缓存的,除了 GPU 内部的软件管理缓存,如纹理缓存.软件管理"意味着这些缓存不是缓存一致的,必须手动刷新.

The GPU memory was not paged. For that matter, the GPU memory is usually uncached, except for the software managed caches inside the GPU, like the texture caches. "Software managed" means these caches are not cache coherent, and must be manually flushed.

通常,GPU 仅访问 CPU DRAM 的一小部分 - 一个孔.通常,它是固定的 - 不受分页的影响.通常,甚至不受虚拟地址转换的影响 - 通常虚拟地址 = 物理地址,+ 可能是一些偏移量.

Typically, only a small section of the CPU DRAM was accessed by the GPU - an aperture. Typically, it was pinned - not subject to paging. Usually, not even subject to virtual address translation - typically virtual address = physical address, + maybe some offset.

(当然,其余的 CPU 内存是正确的虚拟内存,分页,当然翻译和缓存.只是 GPU 无法安全地访问它,因为 GPU 确实(没有)访问虚拟内存子系统和缓存一致性系统.

(Of course, the rest of CPU memory is properly virtual memory, paged, certainly translated, and cached. It's just that the GPU cannot access this safely, because the GPU does (did) not have access to the virtual memory subsystem and the cache coherence system.

现在,上面的方法可行,但很痛苦.首先在 CPU 内部运行,然后在 GPU 内部运行很慢.容易出错.还有一个很大的安全风险:用户提供的 GPU 代码通常可以(缓慢且不安全地)访问所有 CPU DRAM,因此可能被恶意软件使用.

Now, the above works, but it's a pain. Operating on something first inside the CPU then inside the GPU is slow. Error prone. And also a great security risk: user provided GPU code often could access (slowly and unsafely) all CPU DRAM, so could be used by malware.

AMD 宣布了更紧密地集成 GPU 和 CPU 的目标.第一步是创建Fusion"APU,即包含CPU 和GPU 的芯片.(英特尔对 Sandybridge 做了类似的事情;我希望 ARM 也这样做.)

AMD has announced a goal of more tightly integrating GPUs and CPUs. One of the first steps was to create the "Fusion" APUs, chips containing both CPUs and GPUs. (Intel has done similar with Sandybridge; I expect ARM also to do so.)

AMD 还宣布他们打算让 GPU 使用虚拟内存子系统,并使用缓存.

AMD has also announced that they intend to have the GPU use the virtual memory subsystem, and use caches.

让 GPU 使用虚拟内存的一个方向是 AMD IOMMU.英特尔也有类似的.尽管 IOMMU 比非虚拟机操作系统的虚拟内存更面向虚拟机.

A step in the direction of having the GPU use virtual memory is the AMD IOMMU. Intel has similar. Although the IOMMUs are more oriented towards virtual machines than virtual memory for non-virtual machine OSes.

CPU 和 GPU 位于同一芯片内的系统通常让 CPU 和 GPU 访问相同的 DRAM 芯片.所以不再有on-GPU-board"和off-GPU--CPU"DRAM.

Systems where the CPU and GPU are inside the same chip typically have the CPU and GPU access the same DRAM chips. So there is no longer "on-GPU-board" and "off-GPU--CPU" DRAM.

但通常系统主板上的DRAM还是有一个分割,一个分区,分为主要供CPU使用的内存和主要供GPU使用的内存.尽管内存可能位于相同的 DRAM 芯片中,但通常很大一部分是图形".在上面的论文中,由于历史原因,它被称为本地"内存.CPU 和图形内存的调整方式可能不同 - 通常 GPU 内存的优先级较低(视频刷新除外),并且具有更长的突发.

But there usually still is a split, a partition, of the DRAM on the system motherboard into memory mainly used by the CPU, and memory mainly used by the GPU. Even though the memory may live inside the same DRAM chips, typically a big chunk is "graphics". Inthe paper above it is called "Local" memory, for historical reasons. CPU and Graphics memory may be tuned differently - typically the GPU memory is lower priority, except for video refresh, and has longer bursts.

在我向您推荐的论文中,有不同的内部总线:Onion 用于系统"内存,Garlic"用于更快地访问图形内存分区.Garlic 内存通常未缓存.

In the paper I refer you to, there are different internal busses: Onion for "system" memory, and "Garlic" for faster access to the graphics memory partition. Garlic memory is typically uncached.

我参考的论文谈到了 CPU 和 GPU 如何有不同的页表.他们的副标题零复制的好处"是指将 CPU 数据结构器映射到 GPU 页表中,这样您就不需要复制它.

The paper I refer to talks about how the CPU and GPU have different page tables. Their subtitle, "the benefits of zero copy" refers to mapping a CPU datastructurer into the GPU page tables, so that you don't need to copy it.

诸如此类,

系统的这个领域发展迅速,所以 2011 年的论文已经几乎过时了.但你应该注意趋势

This area of the system is evolving rapidly, so the 2011 paper is already almost obsolete. But you should note the trends

(a) 软件想要统一访问 CPU 和 GPU 内存 - 虚拟内存和可缓存

(a) software WANTS uniform access to CPU and GPU memory - virtual memory and cacheable

但是

(b) 尽管硬件试图提供 (a),但特殊的显存功能几乎总是使专用显存(即使只是相同 DRAM 的一个分区)显着更快或能效更高.

(b) although hardware tries to provide (a), special graphics memory features nearly always make dedicated graphics memory, even if just a partition of the same DRAMs, significantly faster or power efficient.

差距可能正在缩小,但每次您认为它即将消失时,可以玩另一个硬件技巧.

The gap may be narrowing, but every time you think it is about to go away, another hardware trick can be played.

——

顺便说一句,这个 2012 年的答案应该更新 - 我是在 2019 年写的.很多仍然适用,例如 tge CPU/GPU 内存区别.GPU 内存的速度仍然更高,但现在通常 GPU 内存比 CPU 多,至少在数据中心 DL 系统中是这样.在家用电脑中没有那么多.此外,GPU 现在支持虚拟内存.这绝不是一个完整的更新.

BTW, this answer from 2012 should be updated - I am writing this in 2019. Much still applies, eg tge CPU/GPU memory distinction. GPU memory is still higher speed, but often nowadays there is more GPU memory than CPU, at least in datacenter DL systems. Not so much in home PCs. Also, GPUs now support virtual memory. This is by no means a full update.

这篇关于GPU - 系统内存映射的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆