新的MacBook Pro(2016年末)GPU上的金属内核行为不正常 [英] Metal kernels not behaving properly on the new MacBook Pro (late 2016) GPUs

查看:171
本文介绍了新的MacBook Pro(2016年末)GPU上的金属内核行为不正常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究使用Swift和Metal在GPU上进行图像处理的macOS项目.上周,我收到了新的15英寸MacBook Pro(2016年末),并且发现我的代码有些奇怪:本应写入纹理的内核似乎没有这样做……

I'm working on macOS project that uses Swift and Metal for image processing on the GPU. Last week, I received my new 15-inch MacBook Pro (late 2016) and noticed something strange with my code: kernels that were supposed to write to a texture did not seem to do so...

经过大量挖掘,我发现问题与Metal(AMD Radeon Pro 455或Intel(R)HD Graphics 530)使用哪个GPU进行计算有关.

After a lot of digging, I found that the problem is related to which GPU is used by Metal (AMD Radeon Pro 455 or Intel(R) HD Graphics 530) to do the computation.

使用MTLCopyAllDevices()初始化MTLDevice将返回代表Radeon和Intel GPU的设备数组(而MTLCreateSystemDefaultDevice()返回默认设备即Radeon).在任何情况下,该代码都可以在Intel GPU上正常工作,但Radeon GPU却并非如此.

Initializing the MTLDevice using MTLCopyAllDevices() returns an array of devices representing the Radeon and the Intel GPUs (while MTLCreateSystemDefaultDevice() returns the default device which is the Radeon). In any case, the code works as expected with the Intel GPU but that is not the case with the Radeon GPU.

让我给你看一个例子.

首先,这是一个简单的内核,它接受输入纹理并将其颜色复制到输出纹理:

To start, here is a simple kernel that takes an input texture and copies its colour to an output texture:

    kernel void passthrough(texture2d<uint, access::read> inTexture [[texture(0)]],
                            texture2d<uint, access::write> outTexture [[texture(1)]],
                            uint2 gid [[thread_position_in_grid]])
    {
        uint4 out = inTexture.read(gid);
        outTexture.write(out, gid);
    }

我要使用此内核,请使用以下代码:

I order to use this kernel, I use this piece of code:

    let devices = MTLCopyAllDevices()
    for device in devices {
        print(device.name!) // [0] -> "AMD Radeon Pro 455", [1] -> "Intel(R) HD Graphics 530"
    }

    let device = devices[0] 
    let library = device.newDefaultLibrary()
    let commandQueue = device.makeCommandQueue()

    let passthroughKernelFunction = library!.makeFunction(name: "passthrough")

    let cps = try! device.makeComputePipelineState(function: passthroughKernelFunction!)

    let commandBuffer = commandQueue.makeCommandBuffer()
    let commandEncoder = commandBuffer.makeComputeCommandEncoder()

    commandEncoder.setComputePipelineState(cps)

    // Texture setup
    let width = 16
    let height = 16
    let byteCount = height*width*4
    let bytesPerRow = width*4
    let region = MTLRegionMake2D(0, 0, width, height)
    let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: .rgba8Uint, width: width, height: height, mipmapped: false)

    // inTexture
    var inData = [UInt8](repeating: 255, count: Int(byteCount))
    let inTexture = device.makeTexture(descriptor: textureDescriptor)
    inTexture.replace(region: region, mipmapLevel: 0, withBytes: &inData, bytesPerRow: bytesPerRow)

    // outTexture
    var outData = [UInt8](repeating: 128, count: Int(byteCount))
    let outTexture = device.makeTexture(descriptor: textureDescriptor)
    outTexture.replace(region: region, mipmapLevel: 0, withBytes: &outData, bytesPerRow: bytesPerRow)

    commandEncoder.setTexture(inTexture, at: 0)
    commandEncoder.setTexture(outTexture, at: 1)
    commandEncoder.dispatchThreadgroups(MTLSize(width: 1,height: 1,depth: 1), threadsPerThreadgroup: MTLSize(width: width, height: height, depth: 1))

    commandEncoder.endEncoding()
    commandBuffer.commit()
    commandBuffer.waitUntilCompleted()

    // Get the data back from the GPU
    outTexture.getBytes(&outData, bytesPerRow: bytesPerRow, from: region , mipmapLevel: 0)

    // Validation
    // outData should be exactly the same as inData 
    for (i,outElement) in outData.enumerated() {
        if outElement != inData[i] {
            print("Dest: \(outElement) != Src: \(inData[i]) at \(i))")
        }
    }

使用let device = devices[0](Radeon GPU)运行此代码时,outTexture永远不会写入(我的假设),因此outData保持不变.另一方面,当使用let device = devices[1](Intel GPU)运行此代码时,一切都会按预期进行,并且outData将使用inData中的值进行更新.

When running this code with let device = devices[0] (Radeon GPU), outTexture is never written to (my supposition) and as a result outData stays unchanged. On the other hand, when running this code with let device = devices[1] (Intel GPU), everything works as expected and outData is updated with the values in inData.

推荐答案

我认为只要GPU写入纹理等MTLStorageModeManaged资源,然后您想从CPU读取该资源(例如,使用),则需要使用blit编码器对其进行同步.尝试将以下内容放在commandBuffer.commit()行上方:

I think that whenever the GPU writes to a MTLStorageModeManaged resource such as a texture and you then want to read that resource from the CPU (e.g. using getBytes()), you need to synchronize it using a blit encoder. Try putting the following above the commandBuffer.commit() line:

let blitEncoder = commandBuffer.makeBlitCommandEncoder()
blitEncoder.synchronize(outTexture)
blitEncoder.endEncoding()

在集成GPU上,如果没有此功能,您可能会不知所措,因为GPU正在使用系统内存作为资源,并且没有任何要同步的东西.

You may get away without this on an integrated GPU because the GPU is using system memory for the resource and there's nothing to synchronize.

这篇关于新的MacBook Pro(2016年末)GPU上的金属内核行为不正常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆