模型在喀拉拉邦预测后如何恢复原始值? [英] How to recover original values after a model predict in keras?

查看:78
本文介绍了模型在喀拉拉邦预测后如何恢复原始值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个更具概念性的问题,但是我不得不承认我已经处理了一段时间.

This is a more conceptual question, but I have to confess I have been dealing with it for a while.

假设您想使用例如keras来训练神经网络(NN).建议您在训练之前对数据进行标准化或标准化,例如,使用标准化:

Suppose you want to train a neural network (NN), using for instance keras. As it is recommended you perform previous to the training a normalization or standardization of the data, so, for instance, with a standardization:

x_new = (x_old - mean)/standarddev

然后,您进行训练(在keras中为model.fit)并最小化损失函数,一切都非常好.

Then, you carry on the training (model.fit in keras) and minimize the loss function, all very nice.

编辑:在我的情况下,我有一组介于200到400之间的值.这是一个具有1个输入,1个输出的NN.正如我所说的,对输入值和期望值进行标准化,因此NN以标准化的方式学习权重和偏差.

In my case, I have a set of values between 200 and 400. It's a NN with 1 input, 1 output. I standardize as told, the input values AND the expected values, so the NN learns the weights and biases in a standardized way.

现在,想象一下,我有一个全新的值在200到400之间的数据集,并且我想使用先前训练的NN来预测输出.您可以在keras中使用model.predict(x),并使用x我收到的,标准化(或规范化)的全新值集,因为您的NN是通过这种方式训练的.但是然后,在predict之后,得到的是一组标准化的值,,但是我想将它们映射到通常的200到400范围内.而且我不知道该怎么做.

Now, imagine that I have a completely new dataset of values between 200 and 400 and I want to predict an output, using the NN with the previous training. You can use model.predict(x) in keras, with x the completely new set of values I have received, standardized (or normalized) because your NN was trained in that way. But then, what I get, after the predict is an array of values standardized, but I want to map them to the usual range of 200 to 400. And I don't know how to do this.

我知道您可以在不进行标准化或标准化的情况下进行训练,但是我读过,如果您进行标准化(或标准化),其值应在单位(神经元)的输出范围内(例如,介于0和乙状结肠为1),训练会得到改善.

I know you can carry on the training without normalizing or standardizing, but I have read that if you standardize (or normalize), with values in the range of the output of the units (neurons) (for instance, between 0 and 1 for a sigmoid), the training improves.

谢谢.

推荐答案

好吧,我认为我正确地了解了您的问题,所以我将尽力向您解释如何处理数据标准化:

Ok, I think that I got what is your problem correctly so I will try to explain you how to deal with data normalization :

1.关于输入和输出的分布的假设:通常在神经网络训练中-您假设您的数据(输入和输出)都来自某种概率分布:我们称其为 X 作为输入和输出的 Y .在训练阶段,有一些原因可以使此区分为零均值,并且具有单位标准差.

1. Assumption about distribiution of inputs and outputs : usually in neural network training - what you assume is that your data (both input and output) comes from some probability distribiutions : let's call it X for input and Y of output. There are some reasons to make this distribiution to be zero mean and with unit standard deviation during the training phase.

2.数据归一化和恢复的统计部分:因此,您必须在训练网络时解决另一项任务.此任务是估计输入分布 X 和输出分布 Y 平均值标准偏差.您只需将经验均值和标准差应用于训练数据即可.

2. Statistical part of data normalization and recovery : because of that - you have to solve another task during training your network. This task is to estimate the mean and standard deviation of both input distribution X and output distribution Y. You are doing that by simply applying empirical mean and standard deviation to your training data.

3.应用阶段-输入::当您将模型应用于新输入时,您还假设输入来自分布 X ,因此您还需要将其标准化为零均值单位标准差,这是一个有趣的部分-您可以同时使用训练集和一组新数据来获得对 X 的均值和标准差的更好估计. strong>,但为了避免在验证情况下过拟合-您通常使用在训练阶段获得的均值和标准差来使新数据标准化.

3. Application phase - inputs : when you apply your model to new input you are also assuming that your input comes from distribiution X so you also need to standarize it to be zero mean and unit standard deviation and here is a funny part - you can use both training set and a set of new data to obtain even better estimation of mean and standard deviation of X but to avoid overfitting in validation case - you usually use the mean and standard deviation obtained during training phase to make new data standarized.

4.应用阶段-输出:这部分比较棘手,因为当您将网络应用于新的标准化输入时,您会从 Y *〜(Y-平均值(Y))/sd(Y),其中平均值'(Y) sd'(Y)是根据经验从您的训练集获得的均值和标准差的估计值, Y 是您的输出的原始分配.这是因为在训练集期间,您会向优化器提供来自此分布的输出数据.因此,要使您的输出变得标准化,您需要应用以下转换: Y * * sd'(Y)+ Mean'(Y).这与标准转换相反.

4. Application phase - outputs : this part is trickier because when you apply your network to new standarized inputs you get new outputs from Y* ~ (Y - mean'(Y)) / sd'(Y) where mean'(Y) and sd'(Y) are estimation of mean and standard deviation obtained empirically from your training set and Y is original distribiution of your output. It's because during your training set you feed your optimizer with output data from this distribiution. So to make your outputs to be restandarized you need to apply transformation: Y* * sd'(Y) + mean'(Y). which is reverse to standarization transformation.

摘要:

您的培训和申请阶段如下:

Your training and application phase looks following :

  1. 通过计算训练输入(平均值'(X) sd'(X))的经验均值和标准差,您将获得训练阶段和应用阶段所需的统计信息您的输出(平均值'(Y) sd'(Y))的strong>和经验均值和标准差.存储它们很重要,因为在应用阶段.
  2. 您将输入和输出数据均标为零均值单位标准偏差,并在其上训练模型.
  3. 在应用程序阶段,通过将输入减去存储的(X)并除以存储的 sd'(X),以获得新的输出是*
  4. 您可以使用在训练阶段获得的存储的平均值'(Y) sd'(Y)对输出进行标准化处理-通过转换(Y * * sd '(Y)+平均值'(Y).
  1. You are obtaining statistics needed for both training phase and application phase by computing empirical mean and standard deviation of your training inputs (mean'(X) and sd'(X) and empirical mean and standard deviation of your outputs (mean'(Y) and sd'(Y)). It's important to store them because they will be needed in application phase.
  2. You standarize your both input and output data to be zero mean and unit standard deviation and train your model on them.
  3. During application phase you standarize your inputs by subtracting it by stored mean'(X) and dividing by stored sd'(X) to obtain new output Y*
  4. You destandarize your outputs using stored mean'(Y) and sd'(Y) - obtained during training phase - by transformation (Y* * sd'(Y) + mean'(Y).

我希望这个答案能够解决您的问题,并让您毫无疑问地对数据进行标准化和反标准化:)

I hope that this answer will solve your problem and leave you with no doubts about details of standarization and destandarization of your data :)

这篇关于模型在喀拉拉邦预测后如何恢复原始值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆