为iOS Metal中的MTLBuffer使用的数据分配内存 [英] Allocating memory for data used by MTLBuffer in iOS Metal
问题描述
作为此答案的后续问题。我试图用在Metal中的内核函数替换在CPU上运行的for循环来并行化计算并加快性能。
As a follow-up question to this answer. I am trying to replace a for-loop running on CPU with a kernel function in Metal to parallelize computation and speed up performance.
我的函数基本上是一个卷积。由于我反复接收输入数组值的新数据(数据来自 AVCaptureSession
),似乎使用 newBufferWithBytesNoCopy:length:options:deallocator:
是创建 MTLBuffer
对象的合理选项。以下是相关代码:
My function is basically a convolution. Since I repeatedly receive new data for my input array values (the data stems from a AVCaptureSession
) it seems that using newBufferWithBytesNoCopy:length:options:deallocator:
is the sensible option for creating the MTLBuffer
objects. Here is the relevant code:
id <MTLBuffer> dataBuffer = [device newBufferWithBytesNoCopy:dataVector length:sizeof(dataVector) options:MTLResourceStorageModeShared deallocator:nil];
id <MTLBuffer> filterBuffer = [device newBufferWithBytesNoCopy:filterVector length:sizeof(filterVector) options:MTLResourceStorageModeShared deallocator:nil];
id <MTLBuffer> outBuffer = [device newBufferWithBytesNoCopy:outVector length:sizeof(outVector) options:MTLResourceStorageModeShared deallocator:nil];
运行时我收到以下错误:
When running this I get the following error:
断言失败`newBufferWithBytesNoCopy:指针0x16fd0bd48不是4096字节对齐。'
failed assertion `newBufferWithBytesNoCopy:pointer 0x16fd0bd48 is not 4096 byte aligned.'
现在,我没有分配任何内存,但是(出于测试目的)只是创建一个固定大小的浮点数的空数组并用随机数填充它。所以我的主要问题是:
Right now, I am not allocating any memory, but (for testing purposes) just creating an empty array of floats of a fixed size and filling it up with random numbers. So my main question is:
如何以正确的方式分配这些浮动数组,以满足以下要求
此值必须导致页面对齐的内存区域。
This value must result in a page-aligned region of memory.
另外,还有一些问题:
- 创建
MTLBuffer $是否有意义c $ c>使用
newBufferWithBytesNoCopy
方法,还是复制数据并不是真正的性能问题? (我的实际数据将包含每个视频帧大约43'000个浮点值。) -
MTLResourceStorageModeShared
<$的正确选择c $ c> MTLResourceOptions -
API参考说明
- Does it even make sense to create the
MTLBuffer
with thenewBufferWithBytesNoCopy
method, or is copying the data not really an issue in terms of performance? (My actual data will consist of approximately 43'000 float values per video frame.) - Is
MTLResourceStorageModeShared
the correct choice forMTLResourceOptions
The API reference says
返回的新MTLBuffer对象的存储分配与指针输入值相同。现有内存分配必须由单个VM区域覆盖,通常使用vm_allocate或mmap分配。特别禁止malloc分配的内存。
The storage allocation of the returned new MTLBuffer object is the same as the pointer input value. The existing memory allocation must be covered by a single VM region, typically allocated with vm_allocate or mmap. Memory allocated by malloc is specifically disallowed.
这是否仅适用于输出缓冲区,还是应用于所有对象的存储分配 MTLBuffer
无法使用 malloc
?
Does this apply only to the output buffer, or should the storage allocation for all objects used with MTLBuffer
not be done with malloc
?
推荐答案
分配页面对齐内存的最简单方法是使用 posix_memalign
。以下是使用页面对齐内存创建缓冲区的完整示例:
The easiest way to allocate page-aligned memory is with posix_memalign
. Here's a complete example of creating a buffer with page-aligned memory:
void *data = NULL;
NSUInteger pageSize = getpagesize();
NSUInteger allocationSize = /* required byte count, rounded up to next multiple of page size */ pageSize * 10;
int result = posix_memalign(&data, pageSize, allocationSize);
if (result == noErr && data) {
id<MTLBuffer> buffer = [device newBufferWithBytesNoCopy:data
length:allocationSize
options:MTLResourceStorageModeShared
deallocator:^(void *pointer, NSUInteger length)
{
free(pointer);
}];
NSLog(@"Created buffer of length %d", (int)buffer.length);
}
由于您无法确保您的数据以页面对齐方式到达指针,您可能最好只分配一个 MTLBuffer
,无论大小如何,都可以容纳您的数据,而不使用no-copy变体。如果需要对数据进行实时处理,则应创建缓冲池并在它们之间循环,而不是等待每个命令缓冲区完成。 共享
存储模式对于这些用例是正确的。与 malloc
相关的警告仅适用于无复制的情况,因为在其他所有情况下,Metal都会为您分配内存。
Since you can't ensure that your data will arrive in a page-aligned pointer, you'll probably be better off just allocating a MTLBuffer
of whatever size can accommodate your data, without using the no-copy variant. If you need to do real-time processing of the data, you should create a pool of buffers and cycle among them instead of waiting for each command buffer to complete. The Shared
storage mode is correct for these use cases. The caveat related to malloc
only applies to the no-copy case, since in every other case, Metal allocates the memory for you.
这篇关于为iOS Metal中的MTLBuffer使用的数据分配内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!