如何在任何深度建模框架中实现以均值和方差值作为输入的高斯渲染器(需要可反向传播) [英] How to implement a gaussian renderer with mean and variance values as input in any deep modeling framework (needs to be back-propagable)

查看:20
本文介绍了如何在任何深度建模框架中实现以均值和方差值作为输入的高斯渲染器(需要可反向传播)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

想象一个典型的自动编码器-解码器模型.但是,我需要实现结构化/自定义解码器,而不是使用去卷积和放大来创建/合成类似于模型输入的张量的通用解码器.

Imagine a typical auto-encoder-decoder model. However, instead of a general decoder where deconvoutions together with upscaling are used to create/synthesize a tensor similar to the model's input, I need to implement a structured/custom decoder.

在这里,我需要解码器接收它的输入,例如一个 10x2 张量,其中每一行代表 x,y 位置或坐标,并渲染一个固定的预定义大小的图像,其中在输入指定的位置生成了 10 个高斯分布.

Here, I need the decoder to take its input, e.g. a 10x2 tensor where each row represents x,y positions or coordinates, and render a fixed predefined size image where there are 10 gaussian distributions generated at the location specified by the input.

另一种方式,我需要创建一个空的固定大小的张量,将 10 个坐标指定的位置填充为值 1,然后在整个张量上扫描高斯核.例如,想象以下一维场景.让整个模型的输入是一个大小为10的向量.如果解码器的输入是[3, 7],即两个x坐标(0-indexing),以及高斯核我们要使用的 size 3 是 [0.28, 0.44, 0.28],那么解码器的输出应该如下所示(应该与模型的原始输入大小相同,即10):

In another way, I need to create an empty fixed sized tensor, fill the locations specified by the 10 coordinates a value 1, and then sweep a gaussian kernel over the whole tensor. For example, imagine the following 1-d scenario. let the input to the whole model be a vector of size 10. if the input to the decoder is [3, 7], which are two x-coordinates (0-indexing), and the gaussian kernel of size 3 that we want to use is [0.28, 0.44, 0.28], then the output of the decoder should look like the following (should be the same size as the original input of the model which is 10):

[0, 0, 0.28, 0.44, 0.28, 0, 0.28, 0.44, 0.28, 0]  

[0, 0, 0, 1, 0, 0, 0, 1, 0, 0]*[0.28, 0.44, 0.28] 相同,其中 * 代表卷积算子.请注意,在第一个向量中,考虑到 0 索引格式,1 或位于位置 3 和 7.

which is the same as [0, 0, 0, 1, 0, 0, 0, 1, 0, 0]*[0.28, 0.44, 0.28] where * represents the convolution operator. please note that in the first vector, the 1 or located at positions 3 and 7 considering a 0-indexing format.

最后将计算典型的像素损失,例如 MSE.重要的部分是这个渲染模块需要能够将误差从损失反向传播到它的坐标输入.

Finally a typical pixel loss such as MSE will be calculated. The important part is that this rendering module needs to be able to backpropagate the errors from the loss to its inputs which are the coordinates.

这个模块本身没有任何可训练的参数.此外,我不想更改此渲染模块之前的图层,它们需要保持原样.在更高级的设置中,我还想提供 4 个协方差值作为输入,即渲染器的输入将采用 [num_points, 5] 的形式,其中每一行是 [x_coord, y_coord, cov(x,x), cov(x,y), cov(y,y)].

This module itself does not have any trainable parameters. Also, I do not want to change the layers coming before this rendering module and they need to stay as they are. In a more advanced setting, I would also like to provide the 4 covariance values as input too, i.e. the input to the renderer would be in the form of [num_points, 5] where each row is [x_coord, y_coord, cov(x,x), cov(x,y), cov(y,y)].

如何在任何可用的深度学习框架中实现这样的模块?对类似事物的提示也非常有用.

How can I implement such a module in any of the available deep learning frameworks? a hint towards something similar would also be very useful.

推荐答案

根据我的经验,神经网络中的守时事物会产生糟糕的性能,因为它减少了远处像素的影响.

In my experience, punctual things in neural networks will have a bad performance because it cuts the influence of distant pixels.

因此,与其使用高斯核,不如将实际的高斯函数应用于所有像素.

Thus, instead of using a gaussian kernel, it would be better to have an actual gaussian function applied to all pixels.

因此,采用二维高斯分布函数:

So, taking a 2D gaussian distribution function:

我们可以这样使用它:

这意味着自定义函数中的一些步骤:

This means some steps in a custom function:

import keras.backend as K

def coords_to_gaussian(x): #where x is shape (batch, 10, 2), and 2 = x, y

    #pixel coordinates - must match the values of x and y
    #here I suppose from 0 to image size, but you may want it normalized, maybe
    x_pixels = K.reshape(K.arange(image_size), (1,1,image_size,1))
    x_pixels = K.concatenate([x_pixels]*image_size, axis=-1) #shape(1,1,size,size)

    y_pixels = K.permute_dimensions(x_pixels, (0,1,3,2))

    pixels = K.stack([x_pixels, y_pixels], axis=-1) #shape(1,1,size,size,2)


    #adjusting the AE locations to a compatible shape:
    locations = K.reshape(x, (-1, 10, 1, 1, 2))


    #calculating the upper part of the equation
    result = K.square(pixels - locations) #shape (batch, 10, size, size, 2)
    result = - K.sum(result, axis=-1) / (2*square_sigma) #shape (batch, 10, size, size)

    #calculating the E:
    result = K.exp(result) / (2 * pi * square_sigma)

    #sum the 10 channels (principle of superposition)
    result = K.sum(result, axis=1) #shape (batch, size, size)

    #add a channel for future convolutions
    result = K.expand_dims(result, axis=-1) #shape (batch, size, size, 1)

    return result

Lambda 层中使用它:

from keras.layers import Lambda
Lambda(coords_to_gaussian)(coordinates_tensor_from_encoder)

我在这里不考虑协方差,但您可能会找到一种方法将它们放入公式中并调整代码.

I'm not considering the covariances here, but you might find a way to put them in the formulas and adjust the code.

这篇关于如何在任何深度建模框架中实现以均值和方差值作为输入的高斯渲染器(需要可反向传播)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆