如何在任何深度建模框架中使用均值和方差值作为输入来实现高斯渲染器(需要向后传播) [英] How to implement a gaussian renderer with mean and variance values as input in any deep modeling framework (needs to be back-propagable)

查看:77
本文介绍了如何在任何深度建模框架中使用均值和方差值作为输入来实现高斯渲染器(需要向后传播)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

想象一个典型的自动编码器-解码器模型.但是,我需要使用结构化/自定义的解码器,而不是使用一般的解码器,将去卷积和放大比例用于创建/合成类似于模型输入的张量.

Imagine a typical auto-encoder-decoder model. However, instead of a general decoder where deconvoutions together with upscaling are used to create/synthesize a tensor similar to the model's input, I need to implement a structured/custom decoder.

在这里,我需要解码器接受其输入,例如一个10x2张量,其中每行代表x,y位置或坐标,并渲染固定的预定义尺寸图像,其中在输入指定的位置生成10个高斯分布.

Here, I need the decoder to take its input, e.g. a 10x2 tensor where each row represents x,y positions or coordinates, and render a fixed predefined size image where there are 10 gaussian distributions generated at the location specified by the input.

以另一种方式,我需要创建一个空的固定大小的张量,将10个坐标指定的位置填充为值1,然后在整个张量上扫描高斯核.例如,想象下面的一维场景.令整个模型的输入为大小为10的向量.如果解码器的输入为[3, 7],这是两个x坐标(0索引),并且我们要使用的大小为3的高斯核为[0.28, 0.44, 0.28],则解码器的输出应类似于以下内容(应与模型的原始输入大小相同,为10):

In another way, I need to create an empty fixed sized tensor, fill the locations specified by the 10 coordinates a value 1, and then sweep a gaussian kernel over the whole tensor. For example, imagine the following 1-d scenario. let the input to the whole model be a vector of size 10. if the input to the decoder is [3, 7], which are two x-coordinates (0-indexing), and the gaussian kernel of size 3 that we want to use is [0.28, 0.44, 0.28], then the output of the decoder should look like the following (should be the same size as the original input of the model which is 10):

[0, 0, 0.28, 0.44, 0.28, 0, 0.28, 0.44, 0.28, 0]  

[0, 0, 0, 1, 0, 0, 0, 1, 0, 0]*[0.28, 0.44, 0.28]相同,其中*表示卷积运算符.请注意,在第一个向量中,考虑到0索引格式,其1或位于位置3和7处.

which is the same as [0, 0, 0, 1, 0, 0, 0, 1, 0, 0]*[0.28, 0.44, 0.28] where * represents the convolution operator. please note that in the first vector, the 1 or located at positions 3 and 7 considering a 0-indexing format.

最后,将计算诸如MSE的典型像素损失.重要的是,此渲染模块需要能够反向传播从丢失到其输入(即坐标)的误差.

Finally a typical pixel loss such as MSE will be calculated. The important part is that this rendering module needs to be able to backpropagate the errors from the loss to its inputs which are the coordinates.

此模块本身没有任何可训练的参数.另外,我不想更改此渲染模块之前的图层,它们需要保持原样.在更高级的设置中,我还想提供4个协方差值作为输入,即,渲染器的输入将采用[num_points, 5]的形式,其中每一行均为[x_coord, y_coord, cov(x,x), cov(x,y), cov(y,y)].

This module itself does not have any trainable parameters. Also, I do not want to change the layers coming before this rendering module and they need to stay as they are. In a more advanced setting, I would also like to provide the 4 covariance values as input too, i.e. the input to the renderer would be in the form of [num_points, 5] where each row is [x_coord, y_coord, cov(x,x), cov(x,y), cov(y,y)].

如何在任何可用的深度学习框架中实现这样的模块?暗示类似的东西也将非常有用.

How can I implement such a module in any of the available deep learning frameworks? a hint towards something similar would also be very useful.

推荐答案

以我的经验,神经网络中的守时性性能会降低,因为它会减少远距离像素的影响.

In my experience, punctual things in neural networks will have a bad performance because it cuts the influence of distant pixels.

因此,最好将实际的高斯函数应用于所有像素,而不是使用高斯内核.

Thus, instead of using a gaussian kernel, it would be better to have an actual gaussian function applied to all pixels.

因此,采用二维高斯分布函数:

So, taking a 2D gaussian distribution function:

我们可以这样使用它:

这意味着自定义功能中的一些步骤:

This means some steps in a custom function:

import keras.backend as K

def coords_to_gaussian(x): #where x is shape (batch, 10, 2), and 2 = x, y

    #pixel coordinates - must match the values of x and y
    #here I suppose from 0 to image size, but you may want it normalized, maybe
    x_pixels = K.reshape(K.arange(image_size), (1,1,image_size,1))
    x_pixels = K.concatenate([x_pixels]*image_size, axis=-1) #shape(1,1,size,size)

    y_pixels = K.permute_dimensions(x_pixels, (0,1,3,2))

    pixels = K.stack([x_pixels, y_pixels], axis=-1) #shape(1,1,size,size,2)


    #adjusting the AE locations to a compatible shape:
    locations = K.reshape(x, (-1, 10, 1, 1, 2))


    #calculating the upper part of the equation
    result = K.square(pixels - locations) #shape (batch, 10, size, size, 2)
    result = - K.sum(result, axis=-1) / (2*square_sigma) #shape (batch, 10, size, size)

    #calculating the E:
    result = K.exp(result) / (2 * pi * square_sigma)

    #sum the 10 channels (principle of superposition)
    result = K.sum(result, axis=1) #shape (batch, size, size)

    #add a channel for future convolutions
    result = K.expand_dims(result, axis=-1) #shape (batch, size, size, 1)

    return result

Lambda层中使用它:

from keras.layers import Lambda
Lambda(coords_to_gaussian)(coordinates_tensor_from_encoder)

我不在这里考虑协方差,但是您可能会找到一种将它们放入公式中并调整代码的方法.

I'm not considering the covariances here, but you might find a way to put them in the formulas and adjust the code.

这篇关于如何在任何深度建模框架中使用均值和方差值作为输入来实现高斯渲染器(需要向后传播)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆