为iOS Metal中的MTLBuffer使用的数据分配内存 [英] Allocating memory for data used by MTLBuffer in iOS Metal

查看:1437
本文介绍了为iOS Metal中的MTLBuffer使用的数据分配内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

作为此答案的后续问题。我试图用在Metal中的内核函数替换在CPU上运行的for循环来并行化计算并加快性能。

As a follow-up question to this answer. I am trying to replace a for-loop running on CPU with a kernel function in Metal to parallelize computation and speed up performance.

我的函数基本上是一个卷积。由于我反复接收输入数组值的新数据(数据来自 AVCaptureSession ),似乎使用 newBufferWithBytesNoCopy:length:options:deallocator: 是创建 MTLBuffer 对象的合理选项。以下是相关代码:

My function is basically a convolution. Since I repeatedly receive new data for my input array values (the data stems from a AVCaptureSession) it seems that using newBufferWithBytesNoCopy:length:options:deallocator: is the sensible option for creating the MTLBuffer objects. Here is the relevant code:

id <MTLBuffer> dataBuffer = [device newBufferWithBytesNoCopy:dataVector length:sizeof(dataVector) options:MTLResourceStorageModeShared deallocator:nil];
id <MTLBuffer> filterBuffer = [device newBufferWithBytesNoCopy:filterVector length:sizeof(filterVector) options:MTLResourceStorageModeShared deallocator:nil];
id <MTLBuffer> outBuffer = [device newBufferWithBytesNoCopy:outVector length:sizeof(outVector) options:MTLResourceStorageModeShared deallocator:nil];

运行时我收到以下错误:

When running this I get the following error:


断言失败`newBufferWithBytesNoCopy:指针0x16fd0bd48不是4096字节对齐。'

failed assertion `newBufferWithBytesNoCopy:pointer 0x16fd0bd48 is not 4096 byte aligned.'

现在,我没有分配任何内存,但是(出于测试目的)只是创建一个固定大小的浮点数的空数组并用随机数填充它。所以我的主要问题是:

Right now, I am not allocating any memory, but (for testing purposes) just creating an empty array of floats of a fixed size and filling it up with random numbers. So my main question is:

如何以正确的方式分配这些浮动数组,以满足以下要求


此值必须导致页面对齐的内存区域。

This value must result in a page-aligned region of memory.

另外,还有一些问题:


  • 创建 MTLBuffer 使用 newBufferWithBytesNoCopy 方法,还是复制数据并不是真正的性能问题? (我的实际数据将包含每个视频帧大约43'000个浮点值。)

  • MTLResourceStorageModeShared <$的正确选择c $ c> MTLResourceOptions

  • API参考说明

  • Does it even make sense to create the MTLBuffer with the newBufferWithBytesNoCopy method, or is copying the data not really an issue in terms of performance? (My actual data will consist of approximately 43'000 float values per video frame.)
  • Is MTLResourceStorageModeShared the correct choice for MTLResourceOptions
  • The API reference says


返回的新MTLBuffer对象的存储分配与指针输入值相同。现有内存分配必须由单个VM区域覆盖,通常使用vm_allocate或mmap分配。特别禁止malloc分配的内存。

The storage allocation of the returned new MTLBuffer object is the same as the pointer input value. The existing memory allocation must be covered by a single VM region, typically allocated with vm_allocate or mmap. Memory allocated by malloc is specifically disallowed.

这是否仅适用于输出缓冲区,还是应用于所有对象的存储分配 MTLBuffer 无法使用 malloc

Does this apply only to the output buffer, or should the storage allocation for all objects used with MTLBuffer not be done with malloc?

推荐答案

分配页面对齐内存的最简单方法是使用 posix_memalign 。以下是使用页面对齐内存创建缓冲区的完整示例:

The easiest way to allocate page-aligned memory is with posix_memalign. Here's a complete example of creating a buffer with page-aligned memory:

void *data = NULL;
NSUInteger pageSize = getpagesize();
NSUInteger allocationSize = /* required byte count, rounded up to next multiple of page size */ pageSize * 10;
int result = posix_memalign(&data, pageSize, allocationSize);

if (result == noErr && data) {
    id<MTLBuffer> buffer = [device newBufferWithBytesNoCopy:data
                                                     length:allocationSize
                                                    options:MTLResourceStorageModeShared
                                                deallocator:^(void *pointer, NSUInteger length)
                                                            {
                                                                free(pointer);
                                                            }];
    NSLog(@"Created buffer of length %d", (int)buffer.length);
}

由于您无法确保您的数据以页面对齐方式到达指针,您可能最好只分配一个 MTLBuffer ,无论大小如何,都可以容纳您的数据,而不使用no-copy变体。如果需要对数据进行实时处理,则应创建缓冲池并在它们之间循环,而不是等待每个命令缓冲区完成。 共享存储模式对于这些用例是正确的。与 malloc 相关的警告仅适用于无复制的情况,因为在其他所有情况下,Metal都会为您分配内存。

Since you can't ensure that your data will arrive in a page-aligned pointer, you'll probably be better off just allocating a MTLBuffer of whatever size can accommodate your data, without using the no-copy variant. If you need to do real-time processing of the data, you should create a pool of buffers and cycle among them instead of waiting for each command buffer to complete. The Shared storage mode is correct for these use cases. The caveat related to malloc only applies to the no-copy case, since in every other case, Metal allocates the memory for you.

这篇关于为iOS Metal中的MTLBuffer使用的数据分配内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆