金属使用计算着色器模拟几何着色器 [英] Metal emulate geometry shaders using compute shaders

查看:105
本文介绍了金属使用计算着色器模拟几何着色器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在Metal中实现体素锥描图.算法中的步骤之一是使用几何着色器对几何进行体素化. Metal没有几何着色器,因此我一直在考虑使用计算着色器对其进行仿真.我将顶点缓冲区传递到计算着色器中,执行几何着色器通常会执行的操作,然后将结果写入输出缓冲区.我还将绘制命令添加到间接缓冲区.我将输出缓冲区用作我的顶点着色器的顶点缓冲区.这可以正常工作,但是我需要为顶点存储两倍的内存,为顶点缓冲区存储一个内存,为输出缓冲区存储一个内存.有什么方法可以将计算着色器的输出直接传递到顶点着色器,而无需将其存储在中间缓冲区中吗?我不需要保存计算着色器的输出缓冲区的内容.我只需要将结果提供给顶点着色器即可.

I'm trying to implement voxel cone tracing in Metal. One of the steps in the algorithm is to voxelize the geometry using a geometry shader. Metal does not have geometry shaders so I was looking into emulating them using a compute shader. I pass in my vertex buffer into the compute shader, do what a geometry shader would normally do, and write the result to an output buffer. I also add a draw command to an indirect buffer. I use the output buffer as the vertex buffer for my vertex shader. This works fine, but I need twice as much memory for my vertices, one for the vertex buffer and one for the output buffer. Is there any way to directly pass the output of the compute shader to the vertex shader without storing it in an intermediate buffer? I don't need to save the contents of the output buffer of the compute shader. I just need to give the results to the vertex shader.

这可能吗?谢谢

编辑

基本上,我正在尝试从glsl仿真以下着色器:

Essentially, I'm trying to emulate the following shader from glsl:

#version 450

layout(triangles) in;
layout(triangle_strip, max_vertices = 3) out;

layout(location = 0) in vec3 in_position[];
layout(location = 1) in vec3 in_normal[];
layout(location = 2) in vec2 in_uv[];

layout(location = 0) out vec3 out_position;
layout(location = 1) out vec3 out_normal;
layout(location = 2) out vec2 out_uv;

void main()
{
    vec3 p = abs(cross(in_position[1] - in_position[0], in_position[2] - in_position[0]));

    for (uint i = 0; i < 3; ++i)
    {
        out_position = in_position[i];
        out_normal = in_normal[i];
        out_uv = in_uv[i];

        if (p.z > p.x && p.z > p.y)
        {
            gl_Position = vec4(out_position.x, out_position.y, 0, 1);
        }
        else if (p.x > p.y && p.x > p.z)
        {
            gl_Position = vec4(out_position.y, out_position.z, 0, 1);
        }
        else
        {
            gl_Position = vec4(out_position.x, out_position.z, 0, 1);
        }

        EmitVertex();
    }

    EndPrimitive();
}

对于每个三角形,我需要在这些新位置输出一个带有顶点的三角形.三角形顶点来自顶点缓冲区,并使用索引缓冲区绘制.我还计划添加将执行保守光栅化的代码(只是将三角形的大小稍微增加一点),但此处未显示.目前,我在Metal计算着色器中所做的工作是使用索引缓冲区获取顶点,在上面的几何着色器中执行相同的代码,然后在另一个缓冲区中输出新的顶点,然后使用该缓冲区进行绘制.

For each triangle, I need to output a triangle with vertices at these new positions instead. The triangle vertices come from a vertex buffer and is drawn using an index buffer. I also plan on adding code that will do conservative rasterization (just increase the size of the triangle by a little bit) but it's not shown here. Currently what I'm doing in the Metal compute shader is using the index buffer to get the vertex, do the same code in the geometry shader above, and outputting the new vertex in another buffer which I then use to draw.

推荐答案

根据几何着色器的确切需求,这是一种非常投机的可能性.

Here's a very speculative possibility depending on exactly what your geometry shader needs to do.

我认为您可以仅使用顶点着色器,而无需单独的计算着色器,以某种向后"的方式进行操作,但这样做会以GPU上的冗余工作为代价.您将进行绘制,就好像一样,您拥有一个几何着色器的输出图元的所有输出顶点的缓冲区.但是,您实际上并没有手头上的东西.您将构建一个顶点着色器,以在飞行中对其进行计算.

I'm thinking you can do it sort of "backwards" with just a vertex shader and no separate compute shader, at the cost of redundant work on the GPU. You would do a draw as if you had a buffer of all of the output vertices of the output primitives of the geometry shader. You would not actually have that on hand, though. You would construct a vertex shader that would calculate them in flight.

因此,在应用程序代码中,计算输出图元的数量,从而计算对于给定数量的输入图元将产生的输出顶点的数量.绘制具有多个顶点的输出图元类型.

So, in the app code, calculate the number of output primitives and therefore the number of output vertices that would be produced for a given count of input primitives. Do a draw of the output primitive type with that many vertices.

您将提供带有输出顶点数据的缓冲区作为此绘图的输入.

You would not provide a buffer with the output vertex data as input to this draw.

您将提供原始索引缓冲区和原始顶点缓冲区作为该绘图的顶点着色器的输入.着色器将从顶点ID中计算出它要用于哪个输出图元以及该图元的哪个顶点(例如,分别对于三角形vid / 3vid % 3).根据输出图元ID,它将计算出哪个输入图元将在原始几何着色器中生成它.

You would provide the original index buffer and original vertex buffer as inputs to the vertex shader for that draw. The shader would calculate from the vertex ID which output primitive it's for, and which vertex of that primitive (e.g. for a triangle, vid / 3 and vid % 3, respectively). From the output primitive ID, it would calculate which input primitive would have generated it in the original geometry shader.

着色器将从索引缓冲区中查找该输入图元的索引,然后从顶点缓冲区中查找顶点数据. (例如,这将对三角形列表与三角形条带之间的区别很敏感.)它将对该数据应用任何几何着色器之前的顶点着色.然后,它将执行几何计算的一部分,该部分有助于标识出的输出图元的标识出的顶点.一旦计算了输出顶点数据,就可以应用所需的任何几何着色器后顶点着色(?).结果就是它将返回的结果.

The shader would look up the indices for that input primitive from the index buffer and then the vertex data from the vertex buffer. (This would be sensitive to the distinction between a triangle list vs. triangle strip, for example.) It would apply any pre-geometry-shader vertex shading to that data. Then it would do the part of the geometry computation that contributes to the identified vertex of the identified output primitive. Once it has calculated the output vertex data, you can apply any post-geometry-shader vertex shading(?) that you want. The result is what it would return.

如果几何着色器可以为每个输入图元生成可变数量的输出图元,那么至少您有最大数量.因此,可以为输出图元的最大潜在计数绘制顶点的最大潜在计数.顶点着色器可以进行必要的计算,以确定几何着色器是否实际上会生成该图元.如果不是这样,则顶点着色器可以将整个图元修剪掉,方法是将其放置在平截头体的外部,也可以使用输出顶点数据的[[clip_distance]]属性.

If the geometry shader can produce a variable number of output primitives per input primitive, well, at least you have a maximum number. So, you can draw the maximum potential count of vertices for the maximum potential count of output primitives. The vertex shader can do the computations necessary to figure out if the geometry shader would have, in fact, produced that primitive. If not, the vertex shader can arrange for the whole primitive to be clipped away, either by positioning it outside of the frustum or using a [[clip_distance]] property of the output vertex data.

这避免了将生成的原语存储在缓冲区中.但是,这会导致GPU重复执行某些几何前着色器顶点着色器和几何着色器计算.当然,它将并行化,但是可能仍然比您现在正在做的要慢.而且,它可能会破坏一些在获取索引和顶点数据方面的优化,而这些优化可能是使用更多普通顶点着色器可能实现的.

This avoids ever storing the generated primitives in a buffer. However, it causes the GPU to do some of the pre-geometry-shader vertex shader and geometry shader calculations repeatedly. It will be parallelized, of course, but may still be slower than what you're doing now. Also, it may defeat some optimizations around fetching indices and vertex data that may be possible with more normal vertex shaders.

以下是您的几何着色器的转换示例:

Here's an example conversion of your geometry shader:

#include <metal_stdlib>
using namespace metal;

struct VertexIn {
    // maybe need packed types here depending on your vertex buffer layout
    // can't use [[attribute(n)]] for these because Metal isn't doing the vertex lookup for us
    float3 position;
    float3 normal;
    float2 uv;
};

struct VertexOut {
    float3 position;
    float3 normal;
    float2 uv;
    float4 new_position [[position]];
};


vertex VertexOut foo(uint vid [[vertex_id]],
                     device const uint *indexes [[buffer(0)]],
                     device const VertexIn *vertexes [[buffer(1)]])
{
    VertexOut out;

    const uint triangle_id = vid / 3;
    const uint vertex_of_triangle = vid % 3;

    // indexes is for a triangle strip even though this shader is invoked for a triangle list.
    const uint index[3] = { indexes[triangle_id], index[triangle_id + 1], index[triangle_id + 2] };
    const VertexIn v[3] = { vertexes[index[0]], vertexes[index[1]], vertexes[index[2]] };

    float3 p = abs(cross(v[1].position - v[0].position, v[2].position - v[0].position));

    out.position = v[vertex_of_triangle].position;
    out.normal = v[vertex_of_triangle].normal;
    out.uv = v[vertex_of_triangle].uv;

    if (p.z > p.x && p.z > p.y)
    {
        out.new_position = float4(out.position.x, out.position.y, 0, 1);
    }
    else if (p.x > p.y && p.x > p.z)
    {
        out.new_position = float4(out.position.y, out.position.z, 0, 1);
    }
    else
    {
        out.new_position = float4(out.position.x, out.position.z, 0, 1);
    }

    return out;
}

这篇关于金属使用计算着色器模拟几何着色器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆